openafs/ChangeLog

1912 lines
84 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

commit 62beb979b4e43c361db54fbf3084f876fd2c11da
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Jul 16 16:53:34 2018 -0500
afs: Skip bulkstat if stat cache looks full
Currently, afs_lookup() will try to prefetch dir entries for normal
dirs via bulkstat whenever multiple pids are reading that dir.
However, if we already have a lot of vcaches, ShakeLooseVCaches may be
struggling to limit the vcaches we already have. Entering
afs_DoBulkStat can make this worse, since we grab afs_xvcache
repeatedly, we may kick out other vcaches, and we'll possibly create
30 new vcaches that may not even be used before they're evicted.
To try to avoid this, skip running afs_DoBulkStat if it looks like the
stat cache is really full.
Reviewed-on: https://gerrit.openafs.org/13256
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 9ff45e73cf3d91d12f09e108e1267e37ae842c87)
Change-Id: I1b84ab3bb918252e8db5e4379730a517181bc9d8
Reviewed-on: https://gerrit.openafs.org/14400
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 28794011085d35b293d5e829adadb372b2a2b3fd
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Jul 16 16:44:14 2018 -0500
afs: Log warning when we detect too many vcaches
Currently, afs_ShakeLooseVCaches has a kind of warning that is logged
when we fail to free up any vcaches. This information can be useful to
know, since it may be a sign that users are trying to access way more
files than our configured vcache limit, hindering performance as we
constantly try to evict and re-create vcaches for files.
However, the current warning is not clear at all to non-expert users,
and it can only occur for non-dynamic vcaches (which is uncommon these
days).
To improve this, try to make a general determination if it looks like
the stat cache is "stressed", and log a message if so after
afs_ShakeLooseVCaches runs (for all platforms, regardless of dynamic
vcaches). Also try to make the message a little more user-friendly,
and only log it (at most) once per 4 hours.
Determining whether the stat cache looks stressed or not is difficult
and arguably subjective (especially for dynamic vcaches). This commit
draws a few arbitrary lines in the sand to make the decision, so at
least something will be logged in the cases where users are constantly
accessing way more files than our configured vcache limit.
Reviewed-on: https://gerrit.openafs.org/13255
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 0532f917f29bdb44f4933f9c8a6c05c7fecc6bbb)
Change-Id: Ic7260f276e00f3e34541207955df841d4ed27844
Reviewed-on: https://gerrit.openafs.org/14399
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 47ab46e1e9fe56b5bf8147eab0b652e74078cbe7
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Jul 16 16:08:13 2018 -0500
afs: Split out bulkstat conditions into a function
Our current if() statement for determining whether we should run
afs_DoBulkStat to prefetch dir entries is a bit large, and grows over
time. Split this logic out into a separate function to make it easier
to maintain, and add some comments to help explain each condition.
This commit should have no visible effects; it's just code
reorganization.
Reviewed-on: https://gerrit.openafs.org/13254
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 19cd454f11997d286bc415e9bc9318a31f73e2c6)
Change-Id: Ida322c518d11787fd794df7534135fbc2dec2935
Reviewed-on: https://gerrit.openafs.org/14398
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 738eb3e4976947657f877ec107d517d63a66a907
Author: Andrew Deason <adeason@sinenomine.net>
Date: Sun Jul 8 15:00:02 2018 -0500
afs: Bound afs_DoBulkStat dir scan
Currently, afs_DoBulkStat will scan the entire directory blob, looking
for entries to stat. If all or almost all entries are already stat'd,
we'll scan through the entire directory, doing nontrivial work on
each entry (we grab afs_xvcache, at least). All of this work is pretty
pointless, since the entries are already cached and so we won't do
anything. If many processes are trying to acquire afs_xvcache, this
can contribute to performance issues.
To avoid this, provide a constant bound on the number of entries we'll
search through: nentries * 4. The current arbitrary limits cap
nentries at 30, so this means we're capping the afs_DoBulkStat search
to 120 entries.
Reviewed-on: https://gerrit.openafs.org/13253
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit ba8b92401b8cb2f5a5306313c2702cb36cba083c)
Change-Id: Icf82f88328621cb5a9e0ad52f873b8a7d74b1f3a
Reviewed-on: https://gerrit.openafs.org/14397
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit fbba8e4cbf00f32181b5075e8d7a876cccd2a046
Author: Andrew Deason <adeason@sinenomine.net>
Date: Thu Jul 13 17:40:36 2017 -0500
afs: Avoid needless W-locks for afs_FindVCache
The callers of afs_FindVCache must hold at least a read lock on
afs_xvcache; some hold a shared or write lock (and set IS_SLOCK or
IS_WLOCK in the given flags). Two callers (afs_EvalFakeStat_int and
afs_DoBulkStat) currently hold a write lock, but neither of them need
to.
In the optimal case, where afs_FindVCache finds the given vcache, this
means that we unnecessarily hold a write lock on afs_xvcache. This can
impact performance, since afs_xvcache can be a very frequently
accessed lock (a simple operation like afs_PutVCache briefly holds a
read lock, for example).
To avoid this, have afs_DoBulkStat hold a shared lock on afs_xvcache,
upgrading to a write lock when needed. afs_EvalFakeStat_int doesn't
ever need a write lock at all, so just convert it to a read lock.
Reviewed-on: https://gerrit.openafs.org/12656
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 6c808e05adb0609e02cd61e3c6c4c09eb93c1630)
Change-Id: Id517d1098b4c3a02db646b2a74535f77cb684ec3
Reviewed-on: https://gerrit.openafs.org/14396
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 5c476b91fd2259e9c34070be8ba201dd471ad974
Author: Andrew Deason <adeason@sinenomine.net>
Date: Thu Jul 13 17:40:21 2017 -0500
afs: Change VerifyVCache2 calls to VerifyVCache
afs_VerifyVCache is a macro that (on most platforms) effectively
expands to:
if ((avc->f.states & CStatd)) {
return 0;
} else {
return afs_VerifyVCache2(...);
}
Some callers call afs_VerifyVCache2 directly, since they already check
for CStatd for other reasons. A few callers currently call
afs_VerifyVCache2, but without guaranteeing that CStatd is not set.
Specifically, in afs_getattr and afs_linux_VerifyVCache, CStatd could
be set while afs_CreateReq drops GLOCK. And in afs_linux_readdir,
CStatd could be cleared at multiple different points before the
VerifyVCache call.
This can result in afs_VerifyVCache2 acquiring a write-lock on the
vcache, even when CStatd is already set, which is an unnecessary
performance hit.
To avoid this, change these call sites to use afs_VerifyVCache instead
of calling afs_VerifyVCache2 directly, which skips the write lock when
CStatd is already set.
Reviewed-on: https://gerrit.openafs.org/12655
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a05d5b7503e466e18f5157006c1de2a2f7d019f7)
Change-Id: I05bdcb7f10930ed465c24a8d7e51077a027b1a4b
Reviewed-on: https://gerrit.openafs.org/14395
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 150ee65f286409e37fcf94807dbeaf4b79ab0769
Author: Andrew Deason <adeason@sinenomine.net>
Date: Sun Apr 26 17:26:02 2020 -0500
rx: Use _IsLast to check for last call in queue
Ever since commits 170dbb3c (rx: Use opr queues) and d9fc4890 (rx: Fix
test for end of call queue for LWP), rx_GetCall checks if the current
call is the last one on rx_incomingCallQueue by doing this:
opr_queue_IsEnd(&rx_incomingCallQueue, cursor)
But opr_queue_IsEnd checks if the given pointer is the _end_ of the
last; that is, if it's the end-of-list sentinel, not an item on the
actual list. Testing for the last item in a list is what
opr_queue_IsLast is for. This is the same convention that the old Rx
queues used, but 170dbb3c just accidentally replaced queue_IsLast with
opr_queue_IsEnd (instead of opr_queue_IsLast), and d9fc4890 copied the
mistake.
So because this is inside an opr_queue_Scan loop, opr_queue_IsEnd will
never be true, so we'll never enter this block of code (unless we are
the "fcfs" thread). This means that an incoming Rx call can get stuck
in the incoming call queue, if all of the following are true:
- The incoming call consists of more than 1 packet of incoming data.
- The incoming call "waits" when it comes in (that is, there are no
free threads or the service is over quota).
- The "fcfs" thread doesn't scan the incoming call queue (because it
is idle when the call comes in, but the relevant service is over
quota).
To fix this, just use opr_queue_IsLast here instead of
opr_queue_IsEnd.
Reviewed-on: https://gerrit.openafs.org/14158
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit befc72749884c6752c7789479343ba48c7d5cea1)
Change-Id: If724245414798ae7a86dfa048cf99863317aef8e
Reviewed-on: https://gerrit.openafs.org/14394
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 12aed9ef2988a69187910bada5ba7325cb6144ab
Author: Andrew Deason <adeason@dson.org>
Date: Sun Jul 21 18:48:51 2019 -0500
rx: Avoid osi_NetSend during rx shutdown
Commit 8d939c08 (rx: avoid nat ping during shutdown) added a call
to shutdown_rx() inside the DARWIN shutdown sequence, before the rx
socket was closed. From the commit message, it sounds like this was
done to avoid NAT pings from calling osi_NetSend during the shutdown
sequence after the rx socket was closed; calling shutdown_rx() before
closing the socket would cause any connections we had to be destroyed
first, avoiding that.
The problem with this is that this means shutdown_rx() is called when
osi_StopNetIfPoller is called, which is much earlier than some other
portions of the shutdown sequence; some of which may hold references
to e.g. rx connections. If we try to, for instance, destroy an rx
connection after shutdown_rx() is called, we could panic.
An earlier version of that commit (gerrit PS1) just tried to insert a
check before the relevant osi_NetSend call, making us just skip the
osi_NetSend if the shutdown sequence had been started. So to avoid the
above issue, try to implement that approach instead. And instead of
doing it just for NAT pings, we can do it for almost all osi_NetSend
calls (besides those involved in the shutdown sequence itself), by
checking this in rxi_NetSend. Also return an error (ESHUTDOWN) if we
skip the osi_NetSend call, so we're not completely silent about doing
so.
This means we also remove the call to shutdown_rx() inside DARWIN's
osi_StopNetIfPoller(). This allows us to interact with Rx objects
during more of the shutdown process in cross-platform code.
Reviewed-on: https://gerrit.openafs.org/13718
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 9866511bb0a5323853e97e3ee92524198813776e)
Change-Id: Ie62c1a68d8a8889f7a8aa3eff3973c219b45a95c
Reviewed-on: https://gerrit.openafs.org/14393
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 7f59989cf84df1e2077f4fcf5649c9624e79a5d2
Author: Andrew Deason <adeason@dson.org>
Date: Sun Jul 21 18:31:53 2019 -0500
rx: Introduce rxi_NetSend
Introduce a small wrapper around osi_NetSend, called rxi_NetSend. This
small wrapper allows future commits to change the code around our
osi_NetSend calls, without needing to change every single call site,
or every implementation of osi_NetSend.
Change most call sites to use rxi_NetSend, instead of osi_NetSend. Do
not change a few callers in the platform-specific kernel shutdown
sequence, since those call osi_NetSend for platform-specific reasons.
This commit on its own does not change any behavior with osi_NetSend;
it is just code reorganization.
Reviewed-on: https://gerrit.openafs.org/13717
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 2a33a80f7026df6b5e47e42319c55d8b7155675a)
Change-Id: I6af8541953a582d044fb668eb4a91720536bc8e1
Reviewed-on: https://gerrit.openafs.org/14392
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 38f613f5e15552938cb425c68b22c166e35284be
Author: Andrew Deason <adeason@sinenomine.net>
Date: Tue Oct 13 20:18:59 2020 -0500
ubik: Remove unused sampleName
The RPC-L type sampleName and related constant UMAXNAMELEN are not
referenced by anything, and have been unused since OpenAFS 1.0. Remove
the unused definitions.
Reviewed-on: https://gerrit.openafs.org/14386
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 83ce8d41c68a5ebcc84132d77af9024c6d285e05)
Change-Id: I1d6c583d9c630fc9704578ba3329132e16b3a803
Reviewed-on: https://gerrit.openafs.org/14401
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit f7374a883505aa34a3db34ddd1163367c544bb0c
Author: Andrew Deason <adeason@dson.org>
Date: Sat May 2 23:54:55 2020 -0500
afs: Drop GLOCK for RXAFS_GetCapabilities
We are hitting the net here; we certainly should not be holding
AFS_GLOCK while waiting for the server's response.
Found via FreeBSD WITNESS.
Reviewed-on: https://gerrit.openafs.org/14181
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 44b7b93b593371bfdddd0be0ae603f4f8720f78b)
Change-Id: I186e08c89136cc3109fd2519bb0d2abbb52f9ba0
Reviewed-on: https://gerrit.openafs.org/14391
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 02fb067d3fded0eee5c9326c56e66b512a75d71c
Author: Michael Meffie <mmeffie@sinenomine.net>
Date: Mon Mar 23 09:46:05 2020 -0400
build: remove unused LINUX_PKGREL from configure.ac
This change removes the unused LINUX_PKGREL definition from the
configure.ac file.
Commit 6a27e228bac196abada96f34ca9cd57f32e31f5c converted the setting of
the RPM package version and release values in the openafs.spec file from
autoconf to the makesrpm.pl script. That commit left LINUX_PKGREL in
configure.ac because it was still referenced by the Debian packaging,
which was still in-tree at that time.
Commit ada9dba0756450993a8e57c05ddbcae7d1891582 removed the last trace
of the Debian packaging, but missed the removal of the LINUX_PKGREL.
Reviewed-on: https://gerrit.openafs.org/14117
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 8ae4531c5720baff9e11e4b05706eab6c82de5f9)
Conflicts:
configure.ac
Change-Id: I69925f89c52aef32aea5bc308140936517b1aeb0
Reviewed-on: https://gerrit.openafs.org/14363
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 2f2cbff766650e7c25eb332d5385f4a3fca8676d
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Apr 20 13:03:15 2020 -0500
ubik: Avoid unlinking garbage during recovery
In urecovery_Interact, if any of our operations fail around
calling DISK_GetFile, we will jump to FetchEndCall and eventually
unlink 'pbuffer'. But if we failed before opening our .DB0.TMP file,
the contents of 'pbuffer' will not be initialized yet.
During most iterations of the recovery loop, the contents of 'pbuffer'
will be filled in from previous loops, and it should always stay the
same, so it's not a big problem. But if this is the first iteration of
the loop, the contents of 'pbuffer' may be stack garbage.
Solve this in two ways. To make sure we don't use garbage contents in
'pbuffer', memset the whole thing to zeroes at the beginning of
urecovery_Interact(). And then to make sure we're not reusing
'pbuffer' contents from previous iterations of the loop, also clear
the first character to NUL each time we arrive at this area of the
recovery code. And avoid unlinking anything if pbuffer starts with a
NUL.
Commit 44e80643 (ubik: Avoid unlinking garbage) fixes the same issue,
but only fixed it in the SDISK_SendFile codepath in remote.c.
Reviewed-on: https://gerrit.openafs.org/14153
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 98b5ffb52117aefac5afb47b30ce9b87eb2fdebf)
Change-Id: I5cadb88e466ddd326ef1e4138d5b1bf89fdb27dc
Reviewed-on: https://gerrit.openafs.org/14365
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 8993e35578e4ae51dd5e8941f77c18bb975e51af
Author: Andrew Deason <adeason@sinenomine.net>
Date: Fri Sep 18 14:03:37 2020 -0600
WINNT: Make opr_threadname_set a no-op
We don't supply an implementation for opr_threadname_set for WINNT;
don't pretend that we do.
Reviewed-on: https://gerrit.openafs.org/13817
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f895a9b51671ffdc920fd9b4284337c5b737a0ef)
Change-Id: Ie8df82550f5420e2b024dea29aae0e39e3ee506f
Reviewed-on: https://gerrit.openafs.org/14369
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 045a97dfbc8dd6a2794b74e16f47984dc5f8eccf
Author: Andrew Deason <adeason@sinenomine.net>
Date: Fri Sep 18 12:11:36 2020 -0600
Move afs_pthread_setname_self to opr
Move the functionality in afs_pthread_setname_self from libutil to
opr, in a new function opr_threadname_set. This allows us to more
easily use the routine in more subsystems, since most code already
uses opr.
Reviewed-on: https://gerrit.openafs.org/13655
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 9d28f7390332c92b3d9e863c6fe70c26db28b5ad)
Change-Id: Ic6bbb615bc3494a7a114a0f4cae711b65ebec111
Reviewed-on: https://gerrit.openafs.org/14368
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 41f4bb48741065da6a69ffcb05e451e5c7dac757
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date: Mon Aug 31 19:56:56 2020 +0000
Revert "vos: take RO volume offline during convertROtoRW"
This reverts commit 32d35db64061e4102281c235cf693341f9de9271. While that
commit did fix the mentioned problem, depending on "vos" to set the
volume to be converted as "out of service" is not ideal. Instead, this
volume should be set as offline by the SAFSVolConvertROtoRWvolume RPC,
executed on the volume server.
The proper fix for this problem will be introduced by another commit.
Reviewed-on: https://gerrit.openafs.org/14339
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 85893ac3df0c2cb48776cf1203ec200507b6ce7d)
Change-Id: Ie125d2dae1301ca5a4f8323099e6e42bc57b6d28
Reviewed-on: https://gerrit.openafs.org/14361
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 29cdc0af10cb2a0f4cb4d1b05e72d6f066a1a7a5
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date: Wed Apr 15 05:33:00 2020 -0500
LINUX: Always crref after _settok_setParentPag
Commit b61eac78 (Linux: setpag() may replace credentials) changed
PSetTokens2 to call crref() after _settok_setParentPag(), since
changing the parent PAG may change our credentials structure. But that
commit did not update the old pioctl PSetTokens, so -setpag
functionality remained broken on Linux for utilities that called the
old pioctl ('klog' is one such utility).
To fix this, we could copy the same code from PSetTokens2 into
PSetTokens. But instead just move this code into _settok_setParentPag
itself, to avoid code duplication. This commit also refactors
_settok_setParentPag a little to make the platform-specific ifdefs a
little easier to read through.
Reviewed-on: https://gerrit.openafs.org/14147
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 8002a46125e8224ba697c194edba5ad09e4cfc44)
Change-Id: I6a9d10428fe470cb38e3ca669f273dde0fa9c875
Reviewed-on: https://gerrit.openafs.org/14328
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 9553128008c7656d78759aa685a574906d352f3c
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date: Wed Apr 15 05:33:00 2020 -0500
LINUX: Copy session keys to parent in SetToken
Commit 48589b5d (Linux: Restore aklog -setpag functionality for kernel
2.6.32+) added code to SetToken() to copy our session keyring to the
parent process, in order to implement -setpag functionality. But this
was removed from SetToken() in commit 1a6d4c16 (Linux: fix aklog
-setpag to work with ktc_SetTokenEx), when the same code was moved to
ktc_SetTokenEx().
Add this code back to SetTokens(), so -setpag functionality can work
again with utilities that use older functions like ktc_SetToken, like
'klog'.
Reviewed-on: https://gerrit.openafs.org/14146
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 826bb826274e48c867b41cb948d031a423373901)
Change-Id: I1ae90d92efa27bce2ff59ff9b9dcca370eaf4730
Reviewed-on: https://gerrit.openafs.org/14327
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 01dfdf2608c5643714da61950637c1e7d1994023
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date: Fri Aug 21 01:54:00 2020 +0530
afs: Avoid NatPing event on all connection
Inside release_conns_user_server, connection vector is traversed and after
destroying a connection new eligible connection is found on which NatPing
event will be set. Ideally there should be only one connection on which
NatPing should be set but in current code while traversing all connection
of server a NatPing event is set on all connections to that server. In
cases where we have large number of connection to a server this can lead
to huge number of “RX_PACKET_TYPE_VERSION” packets sent to a server.
Since this happen during Garbage collection of user structs, to simulate
this issue below steps were tried
- had one script which “cd” to a volume mount and then script sleeps for
large time.
- Ran one infinite while loop where above script was called using PAG
based tokens (As new connection will be created for each PAG)
- Instrumented the code, so that we hit above code segment where NatPing
event is set. Mainly reduced NOTOKTIMEOUT to 60 sec.
To fix this issue set NatPing on one connection and once it is set break
from “for” loop traversing the server connection.
Reviewed-on: https://gerrit.openafs.org/14312
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit b968875a342ba8f11378e76560b46701f21391e8)
Change-Id: I8c70108ab3eb73ed1d9e598381bc29b87ca42aa0
Reviewed-on: https://gerrit.openafs.org/14364
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit e4103743d91bcdba1c57c850eb07b0b03abe72d6
Author: Andrew Deason <adeason@sinenomine.net>
Date: Mon Jun 22 22:54:52 2020 -0500
volser: Don't NUL-pad failed pread()s in dumps
Currently, the volserver SAFSVolDump RPC and the 'voldump' utility
handle short reads from pread() for vnode payloads by padding the
missing data with NUL bytes. That is, if we request 4k of data for our
pread() call, and we only get back 1k of data, we'll write 1k of data
to the volume dump stream followed by 3k of NUL bytes, and log
messages like this:
1 Volser: DumpFile: Error reading inode 1234 for vnode 5678
1 Volser: DumpFile: Null padding file: 3072 bytes at offset 40960
This can happen if we hit EOF on the underlying file sooner than
expected, or if the OS just responds with fewer bytes than requested
for any reason.
The same code path tries to do the same NUL-padding if pread() returns
an error (for example, EIO), padding the entire e.g. 4k block with
NULs. However, in this case, the "padding" code often doesn't work as
intended, because we compare 'n' (set to -1) with 'howMany' (set to 4k
in this example), like so:
if (n < howMany)
Here, 'n' is signed (ssize_t), and 'howMany' is unsigned (size_t), and
so compilers will promote 'n' to the unsigned type, causing this
conditional to fail when n is -1. As a result, all of the relevant log
messages are skipped, and the data in the dumpstream gets corrupted
(we skip a block of data, and our 'howFar' offset goes back by 1). So
this can result in rare silent data corruption in volume dumps, which
can occur during volume releases, moves, etc.
To fix all of this, remove this bizarre NUL-padding behavior in the
volserver. Instead:
- For actual errors from pread(), return an error, like we do for I/O
errors in most other code paths.
- For short reads, just write out the amount of data we actually read,
and keep going.
- For premature EOF, treat it like a pread() error, but log a slightly
different message.
For the 'voldump' utility, the padding behavior can make sense if a
user is trying to recover volume data offline in a disaster recovery
scenario. So for voldump, add a new switch (-pad-errors) to enable the
padding behavior, but change the default behavior to bail out on
errors.
Reviewed-on: https://gerrit.openafs.org/14255
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 4498bd8179e5e93a33468be3c8e7a30e569d560a)
Change-Id: Idf89d70c9d4d650dbf7b73e67c5b71b9bab7c3f4
Reviewed-on: https://gerrit.openafs.org/14367
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 9947625a1d67b4ffdc0582e9081000e34be2b46b
Author: Michael Meffie <mmeffie@sinenomine.net>
Date: Fri May 15 12:01:44 2020 -0400
vos: avoid CreateVolume when restoring over an existing volume
Currently, the UV_RestoreVolume2 function always attempts to create a
new volume, even when doing a incremental restore over an existing
volume. When the volume already exists, the volume creation operation
fails on the volume server with a VVOLEXISTS error. The client will then
attempt to obtain a transaction on the existing volume. If a transaction
is obtained, the incremental restore operation will proceed. If a full
restore is being done, the existing volume is removed and a new empty
volume is created.
Unfortunately, the failed volume creation is logged to by the volume
server, and so litters the log file with:
Volser: CreateVolume: Unable to create the volume; aborted, error code 104
To avoid polluting the volume server log with these messages, reverse
the logic in UV_RestoreVolume2. Assume the volume already exists and try
to get the transaction first when doing an incremental restore. Create a
new volume if the transaction cannot be obtained because the volume is
not present. When doing a full restore, remove the existing volume, if
one exists, and then create a new empty volume.
Reviewed-on: https://gerrit.openafs.org/14208
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
Tested-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f5051b87a56b3a4f7fd7188cbd16a663eee8abbf)
Change-Id: I422b81e0c800ff655ac8851b2872f4d7160d79a8
Reviewed-on: https://gerrit.openafs.org/14326
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 967aa95fe23f98c911b8a618132442159eda3f77
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date: Wed Apr 29 05:10:05 2020 +0000
rxkad: Use krb5_enctype_keysize in tkt_DecodeTicket5
Inside tkt_DecodeTicket5 (rxkad/ticket5.c) function, keysize is calculated
using krb5_enctype_keybits and then dividing number of bits by 8. For 3DES
number of keybits are 168, so keysize comes out to 21(168/8). However
actual keysize of 3DES key is 24. This keysize is passed to
_afsconf_GetRxkadKrb5Key where keysize comparison happens, since there is
keysize mismatch it returns AFSCONF_BADKEY.
To fix this issue get keysize from krb5_enctype_keysize function instead
of krb5_enctype_keybits. Thanks to John Janosik (jpjanosi@us.ibm.com)
for analyzing and fixing this issue.
Reviewed-on: https://gerrit.openafs.org/14203
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 5d53ed0bdab6fea6d2426691bdef2b6f9cb7f2fe)
Change-Id: I16cd7a803a139802671a8045dac09e10ef4ad6cb
Reviewed-on: https://gerrit.openafs.org/14325
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 28e96b954fe17810d1ebdd9fdd4702511e870a67
Author: Jan Iven <iven.jan@gmail.com>
Date: Tue Sep 1 14:51:25 2020 +0200
ptserver: Remove duplicate ubik_SetLock in listSuperGroups
It looks like a call to ubik_SetLock(.. LOCKREAD) was left in
place in listSuperGroups after locking was moved to ReadPreamble
in commit a6d64d70 (ptserver: Refactor per-call ubik initialisation)
When compiled with 'supergroups', and once contacted by
"pts mem -expandgroups ..", ptserver will therefore abort() with
Ubik: Internal Error: attempted to take lock twice
This patch removes the superfluous ubik_SetLock.
FIXES 135147
Reviewed-on: https://gerrit.openafs.org/14338
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 696f2ec67b049639abf04905255a7d6173dbb19e)
Change-Id: I62bfe44e374b398278658b61f2ecf9a66fab18ae
Reviewed-on: https://gerrit.openafs.org/14345
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit c7a3154382376ca2e5effdfed2c447c86b9f6eed
Author: Michael Meffie <mmeffie@sinenomine.net>
Date: Wed Jul 1 21:50:09 2020 -0400
redhat: Add make to the dkms-openafs pre-requirements
If `make` is not installed before dkms-openafs, the OpenAFS kernel
module is not built during the dkms-openafs package installation.
The failure happens in the "checking if linux kernel module build works"
configure step, which invokes `make` to check the linux buildsystem.
configure fails when `make` is not available, and gives the unhelpful
suggestion (in this case) of configuring with --disable-kernel module.
Running the configure.log in the dkms build directory shows:
configure:7739: checking if linux kernel module build works
make -C /lib/modules/4.18.0-193.6.3.el8_2.x86_64/build M=/var/lib/dkms/openafs/...
./configure: line 7771: make: command not found
configure: failed using Makefile:
Avoid this build failure by adding `make` to the list of dkms-openafs
package pre-requirements.
Reviewed-on: https://gerrit.openafs.org/14266
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit e61ab9353e99d3298815296abf6b02c50ebe3df0)
Change-Id: I9b2bb73acabfabc1cf8b5514c8aa66572cc96066
Reviewed-on: https://gerrit.openafs.org/14314
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit ddaf6a74ff3e0aeb4c8b2b15886eaa5c90db59bf
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri Aug 28 10:32:01 2020 -0600
INSTALL: document the minimum Linux kernel level
The change associated with gerrit #14300 removed support for older
Linux kernels (2.6.10 and earlier).
The commit 'Import of code from autoconf-archive' (d8205bbb4) introduced
a check for Autoconf 2.64. Autoconf 2.64 was released in 2009.
The commit 'regen.sh: Use libtoolize -i, and .gitignore generated
build-tools' (a7cc505d3) introduced a dependency on libtool's '-i'
option. Libtool supported the '-i' option with libtool 1.9b in 2004.
Update the INSTALL instructions to document a minimum Linux kernel
level and the minimum levels for autoconf and libtool.
Notes: RHEL4 (EOL in 2017) had a 2.6.9 kernel and RHEL5 has a 2.6.18
kernel. RHEL5 has libtool 1.5.22 and autoconf 2.59, RHEL6 has libtool
2.2.6 and autoconf 2.63, and RHEL7 has libtool 2.4.2 and autoconf 2.69.
Reviewed-on: https://gerrit.openafs.org/14305
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 16bae98ec525fa013514fb46398df682d7637ae0)
Change-Id: I7aaf434928204df77851dd2d651d43b86f5b53d2
Reviewed-on: https://gerrit.openafs.org/14331
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit fee39617c350623dc659e7a21287a1791bdff5d2
Author: Andrew Deason <adeason@sinenomine.net>
Date: Wed Apr 1 22:59:38 2020 -0500
vos: Print "done" in non-verbose 'vos remsite'
Currently, 'vos remsite' always prints the message "Deleting the
replication site for volume %lu ...", and then calls VDONE if the
operation is successful. VDONE prints the trailing "done", but only if
-verbose is turned on, and so if -verbose is not specified, the output
of 'vos remsite' looks broken:
$ vos remsite fs1 vicepa vol.foo
Deleting the replication site for volume 1234 ...Removed replication site fs1 /vicepa for volume vol.foo
To fix this, unconditionally print the trailing "done", instead of
going through VDONE, so 'vos remsite' output now looks like this:
$ vos remsite fs1 vicepa vol.foo
Deleting the replication site for volume 1234 ... done
Removed replication site fs1 /vicepa for volume vol.foo
Reviewed-on: https://gerrit.openafs.org/14127
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f16d40ad26df3ec871f8c73952594ad2e723c9b4)
Change-Id: I4cd7cb2156391004e57cd42437d7174a6bd70992
Reviewed-on: https://gerrit.openafs.org/14311
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit ff5b64d074376992d0be02e19b4da789c20d6bab
Author: Marcio Barbosa <mbarbosa@sinenomine.net>
Date: Thu Feb 13 00:39:00 2020 -0300
vos: take RO volume offline during convertROtoRW
The vos convertROtoRW command converts a RO volume into a RW volume.
Unfortunately, the RO volume in question is not set as "out of service"
during this process. As a result, accesses to the volume being converted
can leave volume objects in an inconsistent state.
Consider the following scenario:
1. Create a volume on host_b and add replicas on host_a and host_b.
$ vos create host_b a vol_1
$ vos addsite host_b a vol_1
$ vos addiste host_a a vol_1
2. Mount the volume:
$ fs mkmount /afs/.mycell/vol_1 vol_1
$ vos release vol_1
$ vos release root.cell
3. Shutdown dafs on host_b:
$ bos shutdown host_b dafs
4. Remove RO reference to host_b from the vldb:
$ vos remsite host_b a vol_1
5. Attach the RO copy by touching it:
$ fs flushall
$ ls /afs/mycell/vol_1
6. Convert RO copy to RW:
$ vos convertROtoRW host_a a vol_1
Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving
the volume object for the RO copy set as VOL_STATE_ATTACHED (on success,
this volume should be set as VOL_STATE_DELETED).
7. Add replica on host_a:
$ vos addsite host_a a vol_1
8. Wait until the "inUse" flag of the RO entry is cleared (or force this
to happen by attaching multiple volumes).
9. Release the volume:
$ vos release vol_1
Failed to start transaction on volume 536870922
Volume not attached, does not exist, or not on line
Error in vos release command.
Volume not attached, does not exist, or not on line
To fix this problem, take the RO volume offline during the vos
convertROtoRW operation.
Reviewed-on: https://gerrit.openafs.org/14066
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 32d35db64061e4102281c235cf693341f9de9271)
Change-Id: Ie4cfab2f04a8859ddfcaece371198ac544066770
Reviewed-on: https://gerrit.openafs.org/14310
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 696d9ce83e3556bda2d3945326937c63dd3560ed
Author: Michael Meffie <mmeffie@sinenomine.net>
Date: Tue Apr 19 20:46:33 2016 -0400
ubik: positional io for db reads and writes
The ubik library was written before positional i/o was available and
issues an lseek system call for each database file read and write. This
change converts the ubik database accesses to use positional i/o on
platforms where pread and pwrite are available, in order to reduce
system call load.
The new inline uphys_pread and uphys_pwrite functions are supplied on
platforms which do not supply pread and pwrite. These functions fall
back to non-positional i/o. If these symbols are present in the database
server binary then the server process will continue to call lseek before
each read and write access of the database file.
This change does not affect the whole-file database synchronization done
by ubik during database recovery (via the DISK_SendFile and DISK_GetFile
RPCs), which still uses non-positional i/o. However, that code does not
share file descriptors with the phys.c code, so there is no possibility
of mixing positional and non-positional i/o on the same FDs.
Reviewed-on: https://gerrit.openafs.org/12272
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit f62fb17b3cf1c886f8cfc2fabe9984070dd3eec4)
Change-Id: Iccc8f749c89659f4acebd74a1115425f69610ba8
Reviewed-on: https://gerrit.openafs.org/14142
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit fffd6e07c87e36cd9a4a36858c3df0c282622195
Author: Michael Meffie <mmeffie@sinenomine.net>
Date: Wed Apr 20 18:17:16 2016 -0400
ubik: remove unnecessary lseeks in uphys_open
The ubik database file access layer has a file descriptor cache to avoid
reopening the database file on each file access. However, the file
offset is reset with lseek on each and every use of the cached file
descriptor, and the file offset is set twice when reading or writing
data records.
This change removes unnecessary and duplicate lseek system calls to
reduce the system call load.
Reviewed-on: https://gerrit.openafs.org/12271
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 6892bfbd701899281b34ee337637d438c7d8f8c6)
Change-Id: I5ea7857796d94eb5b659d868e79b9fea2411f301
Reviewed-on: https://gerrit.openafs.org/14141
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 438bc2c4caf96485e12eda9e7591fc4bd887ebad
Author: Andrew Deason <adeason@sinenomine.net>
Date: Thu Nov 9 12:47:57 2017 -0600
asetkey: Add new 'delete' command variants
The current 'delete' command from asetkey only lets the user delete
old-style rxkad keys. Add a couple of new variants to allow specifying
the key type and subtype, so the user can delete specific key types
and enctypes if they want.
Reviewed-on: https://gerrit.openafs.org/12767
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 5120409cc998284f2fb0467c2f88030976140341)
Change-Id: I8c762839b50f4faf5e583fb5c510bf2ff9dd2259
Reviewed-on: https://gerrit.openafs.org/14293
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 1c395cd1bcf5770f70dce36ef29f9bd7ba98acaa
Author: Kailas Zadbuke <kailashsz@in.ibm.com>
Date: Thu May 7 23:55:39 2020 -0400
salvaged: Fix "-parallel all<number>" parsing
In salavageserver -parallel option takes "all<number>" argument.
However the code does not parse the numeric part correctly. Due
to this, only single instance of salvageserver process was running
even if we provide the larger number with "all" argument.
With this fix, numeric part of "all" argument will be parsed
correctly and will start required number of salvageserver instances.
Reviewed-on: https://gerrit.openafs.org/14201
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 4512d04a9b721cd9052c0e8fe026c93faf6edb9e)
Change-Id: I8910fb514986a404b22256e8a514955a684c8a27
Reviewed-on: https://gerrit.openafs.org/14285
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Kailas Zadbuke <kailashsz@in.ibm.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 38f7e123c33c1c11eb275e0619080ee874244afd
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Tue Jun 16 15:20:20 2020 -0600
tests: Use usleep instead of nanosleep
Commit "Build tests by default" 68f406436cc21853ff854c514353e7eb607cb6cb
changes the build so tests are always built.
On Solaris 10 the build fails because nanosleep is in librt, which we do
not link against.
Replace nanosleep with usleep. This avoids introducing extra configure
tests just for Solaris 10.
Note that with Solaris 11 nanosleep was moved from librt to libc, the
standard C library.
Reviewed-on: https://gerrit.openafs.org/14244
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 22a66e7b7e1d73437a8c26c2a1b45bc4ef214e77)
Change-Id: Ic24c5a149451955b5c130e7b36cec27e02688d83
Reviewed-on: https://gerrit.openafs.org/14291
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 264d6b786c2fd0aff0e4abe835e42868f936afce
Author: Andrew Deason <adeason@sinenomine.net>
Date: Tue Mar 24 11:59:48 2020 -0500
tests: Wait for server start in auth/superuser-t
The auth/superuser-t test runs an Rx server and client in two child
processes. If the client process tries to contact the server before
the server has started listening on its port, some tests involving
RPCs can fail (notably test 39, "Can run a simple RPC").
Normally if we try to contact a server that's not there, Rx will try
resending its packets a few times, but on Linux with AFS_RXERRQ_ENV,
if the port isn't open at all, we can get an ICMP_PORT_UNREACH error,
which causes the relevant Rx call to die immediately with
RX_CALL_DEAD.
This means that if the auth/superuser-t client is only just a bit
faster than the server starting up, tests can fail, since the server's
port is not open yet.
To avoid this, we can wait until the server's port is open before
starting the client process. To do this, have the server process send
a SIGUSR1 to the parent after rx_Init() is called, and have the parent
process wait for the SIGUSR1 (waiting for a max of 5 seconds before
failing). This should guarantee that the server's port will be open by
the time the client starts running.
Note that before commit 086d1858 (LINUX: Include linux/time.h for
linux/errqueue.h), AFS_RXERRQ_ENV was mistakenly disabled on Linux
3.17+, so this issue was probably not possible on recent Linux before
that commit.
Reviewed-on: https://gerrit.openafs.org/14109
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
(cherry picked from commit 66d0f91791695ac585f0511d0dadafd4e570b1bf)
Change-Id: Ia6c06ca9a05e33b3bc35238d9c0d18e7ff339438
Reviewed-on: https://gerrit.openafs.org/14290
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit eaa552d27074cd7e1c862d606916b063e6d89a27
Author: Andrew Deason <adeason@sinenomine.net>
Date: Tue Dec 31 12:04:48 2019 -0600
tests: Introduce afstest_GetProgname
Currently, in tests/volser/vos-t.c we call afs_com_err as
"authname-t", which is clearly a mistake during some code refactoring
(introduced in commit 2ce3fdc5, "tests: Abstract out code to produce a
Ubik client").
We could just change this to "vos-t", but instead of specifying
constant strings everywhere, change this to figure out what the
current command is called, and just use that. Put this code into a new
function, afstest_GetProgname, and convert existing tests to use that
instead of hard-coding the program name given to afs_com_err.
Reviewed-on: https://gerrit.openafs.org/13991
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a21a2f8edb79d6190976e920a9a90d0878411146)
Change-Id: I3d410d6de132f8a0fffeb9cce32a912fe3bbdc20
Reviewed-on: https://gerrit.openafs.org/14289
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit fe57174d84cab19a13eea9a8f31f0ac71c8cf838
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Wed Aug 5 09:23:36 2020 -0600
butc: fix int to float conversion warning
Building with clang-10 results in 2 warnings/errors associated with
with trying to convert 0x7fffffff to a floating point value.
tcmain.c:240:18: error: implicit conversion from 'int' to 'float'
changes value from 2147483647 to 2147483648 [-Werror,
-Wimplicit-int-float-conversion]
if ((total > 0x7fffffff) || (total < 0)) /* Don't go over 2G */
and the same conversion warning on the statement on the following line:
total = 0x7fffffff;
Use floating point and decimal constants instead of the hex constants.
For the test, use 2147483648.0 which is cleanly represented by a float.
Change the comparison in the test from '>' to '>='.
If the total value exceeds 2G, just assign the max value directly to the
return variable.
Reviewed-on: https://gerrit.openafs.org/14277
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 37b55b30c65d0ab8c8eaabfda0dbd90829e2c46a)
Change-Id: I16e0acd893049b01a2c5e4c7e71de3fa40e28d3e
Reviewed-on: https://gerrit.openafs.org/14299
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit e7902252f15acfc28453c531f6fa3b29c9c91b92
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri Aug 21 10:37:51 2020 -0600
LINUX 5.9: Remove HAVE_UNLOCKED_IOCTL/COMPAT_IOCTL
Linux-5.9-rc1 commit 'fs: remove the HAVE_UNLOCKED_IOCTL and
HAVE_COMPAT_IOCTL defines' (4e24566a) removed the two referenced macros
from the kernel.
The support for unlocked_ioctl and compat_ioctl were introduced in
Linux 2.6.11.
Remove references to HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL using
the assumption that they were always defined.
Notes:
With this change, building against kernels 2.6.10 and older will fail.
RHEL4 (EOL in March 2017) used a 2.6.9 kernel. RHEL5 uses a 2.6.18
kernel.
In linux-2.6.33-rc1 the commit messages for "staging: comedi:
Remove check for HAVE_UNLOCKED_IOCTL" (00a1855c) and "Staging: comedi:
remove check for HAVE_COMPAT_IOCTL" (5d7ae225) both state that all new
kernels have support for unlocked_ioctl/compat_ioctl so the checks can
be removed along with removing support for older kernels.
Reviewed-on: https://gerrit.openafs.org/14300
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 13a49aaf0d5c43bce08135edaabb65587e1a8031)
Change-Id: I6dc5ae5b32031641f4a021a31630390a91d834fe
Reviewed-on: https://gerrit.openafs.org/14315
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 99232fa48020bde90f6b245da4374bc63399654f
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Wed Aug 5 09:19:02 2020 -0600
afs: Set AFS_VFSFSID to a numerical value
Currently when UKERNEL is defined, AFS_VFSFSID is always set to
AFS_MOUNT_AFS, which is a string for many platforms for UKERNEL.
Update src/afs/afs.h to insure that the define for AFS_VFSFSID is a
numeric value when building UKERNEL.
Clean up the preprocessor indentation in src/afs/afs.h in the area
around the AFS_VFSFSID defines.
Thanks to adeason@sinenomine.net for pointing out a much easier solution
for resolving this problem.
Reviewed-on: https://gerrit.openafs.org/14279
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 446457a1240b88fd94fc34ff5715f2b7f2f3ef12)
Change-Id: Ic0f9c2f1f4baeb9f99da19e1187f1bc9f5d7f824
Reviewed-on: https://gerrit.openafs.org/14297
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 7e2a62fb594a7713a5c694b9e0401b5e3c2648c8
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Wed Aug 5 09:18:49 2020 -0600
afs: Avoid using logical OR when setting f_fsid
Building with clang-10 produces the warning/error message
warning: converting the result of '<<' to a boolean always evaluates
to true [-Wtautological-constant-compare]
for the expression
abp->f_fsid = (AFS_VFSMAGIC << 16) || AFS_VFSFSID;
The message is because a logical OR '||' is used instead of a bitwise
OR '|'. The result of this expression will always set the f_fsid
member to a 1 and not the intended value of AFS_VFSMAGIC combined with
AFS_VFSFSID.
Update the expression to use a bitwise OR instead of the logical OR.
Note: This will change value stored in the f_fsid that is returned from
statfs.
Using a logical OR has existed since OpenAFS 1.0 for hpux/solaris and in
UKERNEL since OpenAFS 1.5 with the commit 'UKERNEL: add uafs_statvfs'
b822971a.
Reviewed-on: https://gerrit.openafs.org/14292
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit c56873bf95f6325b70e63ed56ce59a3c6b2b753b)
Change-Id: I2e3fe6a84ef6ce73fff933f137d4806efefa5949
Reviewed-on: https://gerrit.openafs.org/14298
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit fc32922236122b9013ed9bdedda8c54c2e8c75d8
Author: Andrew Deason <adeason@sinenomine.net>
Date: Fri May 1 15:02:08 2020 -0500
vlserver: Return error when growing beyond 2 GiB
In the vlserver, when we add a new vlentry or extent block, we grow
the VLDB by doing something like this:
vital_header.eofPtr += sizeof(item);
Since we don't check for overflow, and all of our offset-related
variables are signed 32-bit integers, this can cause some odd behavior
if we try to grow the database to be over 2 GiB in size.
To avoid this, change the two places in vlserver code that grow the
database to use a new function, grow_eofPtr(), which checks for 31-bit
overflow. If we are about to overflow, log a message and return an
error.
See the following for a specific example of our "odd behavior" when we
overflow the 2 GiB limit in the VLDB:
With 1 extent block, we can create 14509076 vlentries successfully. On
the 14509077th vlentry, we'll attempt to write the entry to offset
2147483560 (0x7FFFFFA8). Since a vlentry is 148 bytes long, we'll
write all the way through offset 2147483707 (0x8000003B), which is
over the 31-bit limit.
In the udisk subsystem, this results in writing to page numbers
2097151, and -2097152 (since our ubik pages are 1k, and going over the
31-bit limit causes us to treat offsets as negative). These pages
start at physical offsets 2147482688 (0x7FFFFC40) and -2147483584
(-0x7FFFFFC0) in our vldb.DB0 (where offset is page*1024+64).
Modifying each of these pages involves reading in the existing page
first, modifying the parts we are changing, and writing it back. This
works just fine for 2097151, but of course fails for -2097152. The
latter fails in DReadBuffer when eventually our pread() fails with
EINVAL, and causes ubik to log the message:
Ubik: Error reading database file: errno=22
But when DReadBuffer fails, DReadBufferForWrite assumes this is due to
EOF, and just creates a new buffer for the given page (DNewBuffer).
So, the udisk_write() call ultimately succeeds.
When we go to flush the dirty data to disk when committing the
transaction, after we have successfully written the transaction log,
DFlush() fails for the -2097152 page when the pwrite() call eventually
fails with EINVAL, causing ubik to panic, logging the messages:
Ubik PANIC:
Writing Ubik DB modifications
When the vlserver gets restarted by bosserver, we then process the
transaction log, and perform the operations in the log before starting
up (ReplayLog). The log records the actual data we wrote, not split
into pages, and the log-replaying code writes directly to the db
usying uphys_write instead of udisk_write. So, because of this, the
write actually succeeds when replaying the log, since we just write
148 bytes to offset 2147483624 (0x7FFFFFE8), and no negative offsets
are used.
The vlserver will then be able to run, but will be unable to read that
newly-created vlentry, since it involves reading a ubik page beyond
the 31-bit boundary. That means trying to lookup that entry will fail
with i/o errors, and as well as any entry on the same hash chains as
the new entry (since the new entry will be added to the head of the
hash chain). Listing all entries in the database will also just show
an empty database, since our vital_header.eofPtr will be negative, and
we determine EOF by comparing our current blockindex to the value in
eofPtr.
Reviewed-on: https://gerrit.openafs.org/14180
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
(cherry picked from commit d01398731550b8a93b293800642c3e1592099114)
Change-Id: I72302a8c472b3270e99e58a573f5cf25dd34b9c5
Reviewed-on: https://gerrit.openafs.org/14288
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 03979eaa174de0cff188af451806fb7ae10d3a62
Author: Andrew Deason <adeason@sinenomine.net>
Date: Tue Apr 7 13:15:31 2020 -0500
vlserver: Correctly pad nvlentry for "O" RPCs
For our old-style "O" RPCs (e.g. VL_CreateEntry, instead of
VL_CreateEntryN), vlserver calls vldbentry_to_vlentry to convert to
the internal 'struct nvlentry' format. After all of the sites have
been copied to the internal format, we fill the remaining sites by
setting the serverNumber to BADSERVERID. For nvldbentry_to_vlentry, we
do this for NMAXNSERVERS sites, but for vldbentry_to_vlentry, we do
this for OMAXNSERVERS.
The thing is, both functions are filling in entries for a 'struct
nvlentry', which has NMAXNSERVERS 'serverNumber' entries. So for
vldbentry_to_vlentry, we are skipping setting the last few sites
(specifically, NMAXNSERVERS-OMAXNSERVERS = 13-8 = 5).
This can easily cause our O-style RPCs to write out entries to disk
that have uninitialized sites at the end of the array. For example, an
entry with one site should have server numbers that look like this:
serverNumber = {1, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}
That is, one real serverid (a '1' here), followed by twelve
BADSERVERIDs.
But for a VL_CreateEntry call, the 'struct nvlentry' is zeroed out
before vldbentry_to_vlentry is called, and so the server numbers in
the written entry look like this:
serverNumber = {1, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0}
That is, one real serverid (a '1' here), followed by seven
BADSERVERIDs, followed by five '0's.
Most of the time, this is not noticeable, since our code that reads in
entries from disk stops processing sites when we encounter the first
BADSERVERID site (see vlentry_to_nvldbentry). However, if the entry
has 8 sites, then none of the entries will contain BADSERVERID, and so
we will actually process the trailing 5 bogus sites. This would appear
as 5 extra volume sites for a volume, most likely all for the same
server.
For VL_CreateEntry, the vlentry struct is always zeroed before we use
it, so the trailing sites will always be filled with 0. For
VL_ReplaceEntry, the trailing sites will be unchanged from whatever
was read in from the existing disk entry.
To fix this, just change the relevant loop to go through NMAXNSERVERS
entries, so we actually go to the end of the serverNumber (et al)
array.
This may appear similar to commit ddf7d2a7 (vlserver: initialize
nvlentry elements after read). However, that commit fixed a case
involving the old vldb database format (which hopefully is not being
used). This commit fixes a case where we are using the new vldb
database format, but with the old RPCs, which may still be used by old
tools.
Reviewed-on: https://gerrit.openafs.org/14139
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 7e41ee0bd50d39a356f0435ff370a0a7be40306f)
Change-Id: I68ecc41e7268efd0424e6c129aa914cd04115b59
Reviewed-on: https://gerrit.openafs.org/14287
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 986ee1fd422d1608553af5a6a02bd60349b864ac
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri Jun 19 08:01:14 2020 -0600
afs: Avoid panics on failed return from afs_CFileOpen
afs_CFileOpen is a macro that invokes the open "method" of the
afs_cacheOps structure, and for disk caches the osi_UFSOpen function is
used.
Currently osi_UFSOpen will panic if there is an error encountered while
opening a file.
Prepare to handle osi_UFSOpen function returning a NULL instead of
issuing a panic (future commit).
Update callers of afs_CFileOpen to test for an error and to return an
error instead of issuing a panic.
While this commit eliminates some panics, it does not address some of the
more complex cases associated with errors from afs_CFileOpen.
Reviewed-on: https://gerrit.openafs.org/14241
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d2d27f975df13c3833898611dacff940a5ba3e2a)
Change-Id: Ia30711748b3cffd56eb3120961aa1747dfae0f23
Reviewed-on: https://gerrit.openafs.org/14286
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit a00fe9fd4253b2436ae649b05b329d0d52c8b6f6
Author: Yadavendra Yadav <yadayada@in.ibm.com>
Date: Thu Mar 5 07:21:55 2020 +0000
LINUX: Initialize CellLRU during osi_Init
When OpenAFS kernel module gets loaded, it will create certain entries
in "proc" filesystem. One of those entries is "CellServDB", in case
we read "/proc/fs/openafs/CellServDB" without starting "afsd" it will
result in crash with NULL pointer deref. The reason for crash is
CellLRU has not been initialized yet (since "afsd" is not started)
i.e afs_CellInit is not yet called, because of this "next" and "prev"
pointers will be NULL. Inside "c_start()" we do not check for NULL
pointer while traversing CellLRU and this causes crash.
To avoid this initialize CellLRU during module intialization.
Reviewed-on: https://gerrit.openafs.org/14093
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 8d90a9d27b0ef28ddcdd3eb041c8a9d019b84b50)
Change-Id: I9ed97d3751943331c9d9bc9dfc73acc24231187b
Reviewed-on: https://gerrit.openafs.org/14284
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 6824d45c2ab83e52350c3f366e4cb6f1eb263088
Author: Kailas Zadbuke <kailashsz@in.ibm.com>
Date: Wed Jun 3 15:44:08 2020 +0530
util: Handle serverLogMutex lock across forks
If a process forks when another thread has serverLogMutex locked, the
child process inherits the locked serverLogMutex. This causes a deadlock
when code in the child process tries to lock serverLogMutex, since we
can never unlock serverLogMutex because the locking thread no longer
exists. This can happen in the salvageserver, since the salvageserver
locks serverLogMutex in different threads, and forks to handle salvage
jobs.
To avoid this deadlock, we register handlers using pthread_atfork()
so that the serverLogMutex will be held during the fork. The fork will
be blocked until the worker thread releases the serverLogMutex. Hence the
serverLogMutex will be held until the fork is complete and it will be
released in the parent and child threads.
Thanks to Yadavendra Yadav(yadayada@in.ibm.com) for working with me
on this issue.
Reviewed-on: https://gerrit.openafs.org/14239
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit e44d6441c8786fdaaa1fad1b1ae77704c12f7d60)
Change-Id: I09c04c0bd99b10433857ccdaeb4ee6a4cd50f768
Reviewed-on: https://gerrit.openafs.org/14283
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Kailas Zadbuke <kailashsz@in.ibm.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 179a418ea5063785a23e4faf35134f063a6f3e1c
Author: Andrew Deason <adeason@sinenomine.net>
Date: Fri Mar 13 13:00:35 2020 -0500
LINUX: Properly revert creds in osi_UFSTruncate
Commit cd3221d3 (Linux: use override_creds when available) caused us
to force the current process's creds to the creds of afsd during
osi_file.c file ops, to avoid access errors in some cases.
However, in osi_UFSTruncate, one code path was missed to revert our
creds back to the original user's creds: when the afs_osi_Stat call
fails or deems the truncate unnecessary. In this case, the calling
process keeps the creds for afsd after osi_UFSTruncate returns,
causing our subsequent access-checking code to think that the current
process is in the same context as afsd (typically uid 0 without a
pag).
This can cause the calling process to appear to transiently have the
same access as non-pag uid 0; typically this will be unauthenticated
access, but could be authenticated if uid 0 has tokens.
To fix this, modify the early return in osi_UFSTruncate to go through
a 'goto done' destructor instead, and make sure we revert our creds in
that destructor.
Thanks to cwills@sinenomine.net for finding and helping reproduce the
issue.
Reviewed-on: https://gerrit.openafs.org/14098
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: Cheyenne Wills <cwills@sinenomine.net>
(cherry picked from commit 57b4f4f9be1e25d5609301c10f717aff32aef676)
Change-Id: I714eb2dea9645ffe555f26b5d69707a7afbe8d81
Reviewed-on: https://gerrit.openafs.org/14099
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit ee578e92d9f810d93659a9805d0c12084fe2bb95
Author: Jeffrey Hutzelman <jhutz@cmu.edu>
Date: Thu May 2 16:02:47 2019 -0400
Linux: use override_creds when available
Linux may perform some access control checks at the time of an I/O
operation, rather than relying solely on checks done when the file is
opened. In some cases (e.g. AppArmor), these checks are done based on
the current tasks's creds at the time of the I/O operation, not those
used when the file was open.
Because of this, we must use override_creds() / revert_creds() to make
sure we are using privileged credentials when performing I/O operations
on cache files. Otherwise, cache I/O operations done in the context of
a task with a restrictive AppArmor profile will fail.
Reviewed-on: https://gerrit.openafs.org/13751
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit cd3221d3532a28111ad22d4090ec913cbbff40da)
Change-Id: I8955ff6150462fecba9a10a8f99bce9ee8163435
Reviewed-on: https://gerrit.openafs.org/14082
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit facff58b840a47853592510617ba7a1da2e3eaa9
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri Jul 3 10:35:06 2020 -0600
LINUX 5.8: use lru_cache_add
With Linux-5.8-rc1 commit 'mm: fold and remove lru_cache_add_anon() and
lru_cache_add_file()' (6058eaec), the lru_cache_add_file function is
removed since it was functionally equivalent to lru_cache_add.
Replace lru_cache_add_file with lru_cache_add.
Introduce a new autoconf test to determine if lru_cache_add is present
For reference, the Linux changes associated with the lru caches:
__pagevec_lru_add introduced before v2.6.12-rc2
lru_cache_add_file introduced in v2.6.28-rc1
__pagevec_lru_add_file replaces __pagevec_lru_add in v2.6.28-rc1
vmscan: split LRU lists into anon & file sets (4f98a2fee)
__pagevec_lru_add removed in v5.7 with a note to use lru_cache_add_file
mm/swap.c: not necessary to export __pagevec_lru_add() (bde07cfc6)
lru_cache_add_file removed in v5.8
mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()
(6058eaec)
lru_cache_add exported
mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()
(6058eaec)
Openafs will use:
lru_cache_add on 5.8 kernels
lru_cache_add_file from 2.6.28 through 5.7 kernels
__pagevec_lru_add/__pagevec_lru_add_file on pre 2.6.28 kernels
Reviewed-on: https://gerrit.openafs.org/14249
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 7d85ce221d6ccc19cf76ce7680c74311e4ed2632)
Change-Id: Iba6ef4441687dbf60d227a708e2a032c2c0dc79f
Reviewed-on: https://gerrit.openafs.org/14269
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 335f37be13d2ff954e4aeea617ee66502170805e
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri Jul 3 10:34:42 2020 -0600
LINUX 5.8: do not set name field in backing_dev_info
Linux-5.8-rc1 commit 'bdi: remove the name field in struct
backing_dev_info' (1cd925d5838)
Do not set the name field in the backing_dev_info structure if it is
not available. Uses an existing config test
'STRUCT_BACKING_DEV_INFO_HAS_NAME'
Note the name field in the backing_dev_info structure was added in
Linux-2.6.32
Reviewed-on: https://gerrit.openafs.org/14248
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d8ec294534fcdee77a2ccd297b4b167dc4d5573d)
Change-Id: I3d9e18092db998a4c4f26bd63ee3b75383a53d4c
Reviewed-on: https://gerrit.openafs.org/14268
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit d7fc5bf9bf031089d80703c48daf30d5b15a80ca
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri Jul 3 10:33:51 2020 -0600
LINUX 5.8: Replace kernel_setsockopt with new funcs
Linux 5.8-rc1 commit 'net: remove kernel_setsockopt' (5a892ff2facb)
retires the kernel_setsockopt function. In prior kernel commits new
functions (ip_sock_set_*) were added to replace the specific functions
performed by kernel_setsockopt.
Define new config test 'HAVE_IP_SOCK_SET' if the 'ip_sock_set' functions
are available. The config define 'HAVE_KERNEL_SETSOCKOPT' is no longer
set in Linux 5.8.
Create wrapper functions that replace the kernel_setsockopt calls with
calls to the appropriate Linux kernel function(s) (depending on what
functions the kernel supports).
Remove the unused 'kernel_getsockopt' function (used for building with
pre 2.6.19 kernels).
For reference
Linux 2.6.19 introduced kernel_setsockopt
Linux 5.8 removed kernel_setsockopt and replaced the functionality
with a set of new functions (ip_sock_set_*)
Reviewed-on: https://gerrit.openafs.org/14247
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit c48072b9800759ef1682b91ff1e962f6904a2594)
Change-Id: I2724fad06b1882149d2066d13eced55eff5ee695
Reviewed-on: https://gerrit.openafs.org/14267
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Laß <lass@mail.uni-paderborn.de>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 0f67e733e82a9001f3f9253c5e1880be845d537b
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Thu Apr 2 13:27:50 2020 -0600
LINUX: Include linux/time.h for linux/errqueue.h
The configuration test for errqueue.h fails with an undefined structure
error on a Linux 3.17 (or higher) system. This prevents setting
HAVE_LINUX_ERRQUEUE_H, which is used to define AFS_RXERRQ_ENV.
Linux commit f24b9be5957b38bb420b838115040dc2031b7d0c (net-timestamp:
extend SCM_TIMESTAMPING ancillary data struct) - which was picked up in
linux 3.17 added a structure that uses the timespec structure. After
this commit, we need to include linux/time.h to pull in the definition
of the timespec struct.
Reviewed-on: https://gerrit.openafs.org/13950
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 086d185872da5f19447cf5ec7846e7ce5104563f)
Change-Id: I67d01b11c5ea95b8b87832fc833f8fc850ade684
Reviewed-on: https://gerrit.openafs.org/14130
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit 5a14bd0abe83b580f0cc7a200ae963399ab7de5f
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Tue May 26 12:11:28 2020 -0600
vol: Fix format-truncation warning with gcc-10.1
Building with gcc-10.1 produces a warning (error if --enable-checking)
in vol-salvage.c
error: %s directive output may be truncated writing up to 755 bytes
into a region of size 255 [-Werror=format-truncation=]
809 | snprintf(inodeListPath, 255, "%s" OS_DIRSEP "salvage.inodes.%s.%d", tdir, name,
Use strdup/asprintf to allocate the buffer dynamically instead of using
a buffer with a hardcoded size.
Reviewed-on: https://gerrit.openafs.org/14207
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d73680c5f70ee5aeb634a9ec88bf1097743d0f76)
Change-Id: I8d3bf244a70f723f585738905deea7ddfb1bb862
Reviewed-on: https://gerrit.openafs.org/14232
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit d5fc5283e91ea94a67df8364a5b8bf8970ffe934
Author: Michael Meffie <mmeffie@sinenomine.net>
Date: Mon Oct 9 22:16:09 2017 -0400
afsmonitor: remove unused LWP_WaitProcess
Remove the unimplemented once-only flag and the unused LWP_WaitProcess
call.
Reviewed-on: https://gerrit.openafs.org/12745
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 7c27365ea24aed5787f6fc03f30f6085c78ece51)
Change-Id: I3b61f9fb4f45564304b0e35878d3535a10e31d02
Reviewed-on: https://gerrit.openafs.org/14226
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
commit a2eec64374d6b754b29c509b554573cb6e53eb46
Author: Cheyenne Wills <cwills@sinenomine.net>
Date: Fri May 22 12:16:48 2020 -0600
Avoid duplicate definitions of globals
GCC 10 changed a default flag from -fcommon to -fno-common. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85678 for some background.
The change in gcc 10 results in build link-time errors. For example:
../../src/xstat/.libs/liboafs_xstat_cm.a(xstat_cm.o):(.bss+0x2050):
multiple definition of `numCollections';
Ensure that only one definition for global data objects exist and change
references to use "extern" as needed.
To ensure that future changes do not introduce duplicated global
definitions, add the -fno-common flag to XCFLAGS when using the
configure --enable-checking setting.
[cwills@sinenomine.net: Note for 1.8.x: renamed terminationEvent
to cm_terminationEvent/fs_terminationEvent instead of deleting it.]
Reviewed-on: https://gerrit.openafs.org/14106
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 0e2072ae386d4111bef161eb955964b649c31386)
Change-Id: I54ca61d372cf763e4a28c0b0829ea361219f6203
Reviewed-on: https://gerrit.openafs.org/14217
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>