gnuplot> delta=0.01 gnuplot> g(x) = (norm(x+delta)-norm(x))/delta gnuplot> set title "Gaussian Normal Density" gnuplot> plot [-4:4] [0:0.5] g(x) notitle lw 3 gnuplot> set term post eps "Helvetica" 24 gnuplot> set out "normal.eps" gnuplot> replot [rdv@localhost systems-software]$ file normal.eps normal.eps: PostScript document text conforming at level 2.0 - type EPS [rdv@localhost systems-software]$ convert -size 720x504 -resize 720x504 normal.eps normal.png [rdv@localhost systems-software]$ file !$ file normal.png normal.png: PNG image data, 720 x 504, 16-bit/color RGB, non-interlaced [rdv@localhost systems-software]$ display !$ display normal.png
The normal distribution is a continuous function; its discrete counterpart is the Poisson distribution.
startclock(); for ( i = 0 ; i < NUMREPS ; i++ ) do_short_operation(); stopclock();for some value of NUMREPS like 100 or 1000. This still doesn't tell you about the exact distribution of the time for the short operations, but it can tell you about the mean.
A few of you have already hit on using the Intel processor Time Stamp Counter (TSC). That's an excellent idea, but it does have drawbacks:
Recently, for a project, I adapted the function gsl_fit_linear for some code. The adaptation was actually a hassle, so I don't recommend you do it, but for what it's worth, here's the code itself from the GNU Scientific Library (GSL).
/* Fit the data (x_i, y_i) to the linear relationship
Y = c0 + c1 x
returning,
c0, c1 -- coefficients
cov00, cov01, cov11 -- variance-covariance matrix of c0 and c1,
sumsq -- sum of squares of residuals
This fit can be used in the case where the errors for the data are
uknown, but assumed equal for all points. The resulting
variance-covariance matrix estimates the error in the coefficients
from the observed variance of the points around the best fit line.
*/
int
gsl_fit_linear (const double *x, const size_t xstride,
const double *y, const size_t ystride,
const size_t n,
double *c0, double *c1,
double *cov_00, double *cov_01, double *cov_11, double *sumsq)
{
double m_x = 0, m_y = 0, m_dx2 = 0, m_dxdy = 0;
size_t i;
for (i = 0; i < n; i++)
{
m_x += (x[i * xstride] - m_x) / (i + 1.0);
m_y += (y[i * ystride] - m_y) / (i + 1.0);
}
for (i = 0; i < n; i++)
{
const double dx = x[i * xstride] - m_x;
const double dy = y[i * ystride] - m_y;
m_dx2 += (dx * dx - m_dx2) / (i + 1.0);
m_dxdy += (dx * dy - m_dxdy) / (i + 1.0);
}
/* In terms of y = a + b x */
{
double s2 = 0, d2 = 0;
double b = m_dxdy / m_dx2;
double a = m_y - m_x * b;
*c0 = a;
*c1 = b;
/* Compute chi^2 = \sum (y_i - (a + b * x_i))^2 */
for (i = 0; i < n; i++)
{
const double dx = x[i * xstride] - m_x;
const double dy = y[i * ystride] - m_y;
const double d = dy - b * dx;
d2 += d * d;
}
s2 = d2 / (n - 2.0); /* chisq per degree of freedom */
*cov_00 = s2 * (1.0 / n) * (1 + m_x * m_x / m_dx2);
*cov_11 = s2 * 1.0 / (n * m_dx2);
*cov_01 = s2 * (-m_x) / (n * m_dx2);
*sumsq = d2;
}
return GSL_SUCCESS;
}
Sometimes, a line is a good fit for only part of your total
data. Sometimes, a different line will fit a later portion of your
data; such a case is called a multi-linear fit.
Four questions on the memory hierarchy:
If you program in C at all, you should be familiar with pointers by now, but let me go over it quickly...
Sometimes, memory is wasted due to a process known as fragmentation. Fragmentation occurs when various objects are created and deleted, leaving behind holes in the memory space. The memory manager's job is to see that applications can always get the memory they need, by using an algorithm that minimizes fragmentation and keeps holes under control.
Several different algorithms can be used to assign memory to the next request that comes in:
[rdv@dhcp-143-236 ~]$ more /proc/buddyinfo Node 0, zone DMA 2 4 3 4 5 4 2 2 3 1 1 Node 0, zone Normal 242 110 156 111 78 43 20 7 7 4 3 Node 0, zone HighMem 2 0 0 1 1 1 0 0 0 0 0
The original form of multiprogramming actually involved swapping complete processes into and out of memory, to a special reserved area of disk (or drum). This approach allowed each process to act as if it owned all of the memory in the system, without worrying about other processes. However, swapping a process out and in is not fast!
The simplest approach would be a large, flat page table with one entry per page. The entries are known as page table entries, or PTEs. However, this approach results in a page table that is too large to fit inside the MMU itself, meaning that it has to be in memory. In fact, for a 4GB address space, with 32-bit PTEs and 4KB pages, the page table alone is 4MB! That's big when you consider that there might be a hundred processes running on your system.
The solution is multi-level page tables. As the size of the process grows, additional pages are allocated, and when they are allocated the matching part of the page table is filled in.
The translation from virtual to physical address must be fast. This fact argues for as much of the translation as possible to be done in hardware, but the tradeoff is more complex hardware, and more expensive process switches. Since it is not practical to put the entire page table in the MMU, the MMU includes what is called the TLB: translation lookaside buffer.
(Images from O'Reilly's book on Linux device drivers, and from lvsp.org.)
We don't have time to go into the details right now, but you should be aware that doing the page tables for a 64-bit processor is a lot more complicated, when performance is taken into consideration.
Linux uses a three-level page table system. Each level supports 512 entries: "With Andi's patch, the x86-64 architecture implements a 512-entry PML4 directory, 512-entry PGD, 512-entry PMD, and 512-entry PTE. After various deductions, that is sufficient to implement a 128TB address space, which should last for a little while," says Linux Weekly News.
#define IA64_MAX_PHYS_BITS 50 /* max. number of physical address bits (architected) */ ... /* * Definitions for fourth level: */ #define PTRS_PER_PTE (__IA64_UL(1) << (PTRS_PER_PTD_SHIFT))
第7回 6月14日 ページ置換アルゴリズム
Lecture 7, June 14: Page Replacement Algorithms
We will also talk a little bit about memory-mapped files.
Followup from this week: