glibc/glibc-getnprocs.diff

149 lines
3.9 KiB
Diff

This bug is in reference to a bug introduced in glibc 2.11 but present in glibc
HEAD by the following patch set:
http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=6a3d03ff58742430a252beac4a1917506512e319
The patch came from this bugzilla bug:
https://bugzilla.redhat.com/show_bug.cgi?id=494631
Which simply proposed the first part:
+ nl = memchr (*cp, '\n', *re - *cp);
+ while (nl == NULL && *re == buffer_end)
+ {
+ /* Truncate too long lines. */
+ *re = buffer + 3 * (buffer_end - buffer) / 4;
+ n = read_not_cancel (fd, *re, buffer_end - *re);
+ if (n < 0)
+ return NULL;
+
+ nl = memchr (*re, '\n', n);
+ **re = '\n';
+ *re += n;
+ }
}
+ else
+ nl = memchr (*cp, '\n', *re - *cp);
This was checked in with a second part:
else if (nl + 5 >= *re)
{
memmove (buffer, nl, *re - nl);
*re = buffer + (*re - nl);
nl = *cp = buffer;
ssize_t n = read_not_cancel (fd, *re, buffer_end - *re);
if (n < 0)
return NULL;
*re += n;
}
Which is meant to determine whether there's enough room at the end of the
buffer to hold "cpu*\n".
else if (nl + 5 >= *re)
{
memmove (buffer, nl, *re - nl);
*re = buffer + (*re - nl);
nl = *cp = buffer;
ssize_t n = read_not_cancel (fd, *re, buffer_end - *re);
if (n < 0)
return NULL;
*re += n;
}
*cp = nl + 1;
This second block is erroneous (and redundant). If there's not enough room to
complete a cpu[:digit:] line it'll try to move what's partially in the end of
the buffer to the front (preserving that portion) and then read in a bunch
more, assuming that a \n is in the next read.
But it actually ends up overwriting the current line with the contents of the
second line.
Consider the following line
| C | P | U | 1 | \n | C | P | U | 2 | \n | C | P | U | 3 | F | O | O | B | A |
R | F | O | O | B |
A | R | \n | ...
where sizeof(buffer) == 12.
The first time through 'buffer' is filled as follows:
| C | P | U | 1 | \n | C | P | U | 2 | \n | C | P |
and 'cpu1' is the line that is returned. The second time through 'cpu2' is the
line that is supposed to be returned but this problematic branch is hit and the
last C P is copied to the front of the buffer over top of C P U 2. Following
this U 3 F O O B A R F O is read into the rest of 'buffer':
| C | P ||| U | 3 | F | O | O | B | A | R | F | O |
This throws off the whole incrementing calculation.
This second else block isn't necessary. Jakub's first part covers the scenario
shown in the example above quite well:
The following test case:
#include<stdio.h>
#include<sys/sysinfo.h>
int main()
{
int lcpus=get_nprocs();
printf("logical cpus = %d\n",lcpus);
return 0;
}
Can be run against the attached /proc/stat file which is known to reproduce the
problem:
This stat file can be used with the testcase by bind mounting it over
/proc/stat:
cp stat /dev/shm/stat
mount --bind /dev/shm/stat /proc/stat
When run this should show:
logical cpus = 1024
since cpu1024 is high cpu number. But it shows something like:
logical cpus = 137
Also attached is a patch which removes the erroneous else block.
When re-run against with this patch it reports the correct number of cpus.
Thanks to Milton Miller for describing this problem.
--- glibc-2.11.1/sysdeps/unix/sysv/linux/getsysstats.c 2010-01-18 11:01:41.000000000 -0600
+++ glibc-2.11.1-new/sysdeps/unix/sysv/linux/getsysstats.c 2010-03-23 08:00:26.000000000 -0500
@@ -117,18 +117,6 @@
if (nl == NULL)
nl = *re - 1;
}
- else if (nl + 5 >= *re)
- {
- memmove (buffer, nl, *re - nl);
- *re = buffer + (*re - nl);
- nl = *cp = buffer;
-
- ssize_t n = read_not_cancel (fd, *re, buffer_end - *re);
- if (n < 0)
- return NULL;
-
- *re += n;
- }
*cp = nl + 1;
assert (*cp <= *re);