Rising Results in Sandra's Memory Bandwith benchmark
Posted: Sun Feb 04, 2007 8:28 pm
Peter, I did start a new post over here, and finished typing it up; but the last part of it I decided to PM you instead. Wound up losing my post in the process of sending the PM. Careless of me, that. However, I could not sit and watch what I believe to be nonsense being posted without response in the thread in which it appeared.
O.K. I will state it all again, perhaps more clearly this time.
1) The K6-3+ can cache all the RAM addresses a Super 7 can throw at it; but it does not have sufficient cache lines to store all of the data required by the OS and Sandra while running the Memory Bandwidth test. NOTE : If Peter argues this I want KachiWachi's opinion on the subject.
2) Where a large amount of RAM is uncached at 3rd level, a significant amount of data required by Sanda and the OS is going to windup residing in RAM that is uncached at third level. (Though it is at least address cached at 2nd level.)
3) When the CPU requires data it first searchs the tag ram to see if the contents of that RAM address is stored in one of the 2nd level cache lines. If the data is not available there, it then searchs to see if it is stored in one of the 3rd level cache lines. If the data is stored in neither cache then the processor has to search RAM. When the processor finds the data, if it is residing in RAM that is uncached at 3rd level it writes the data into a 2nd level cache line overwriting something else. (Because the second level cache is always full)
4) When the processor next needs that overwritten data, it will no longer be available in second level cache, so the processor, having fruitlessly checked 2nd level cache, checks 3rd level cache. If the data is not there, then again the processor goes back to searching RAM. If this time the data is residing in RAM that is cached at 3rd level cache, then the processor writes that data into one of the 3rd level cache lines, and 2nd level too for all I know. (In which case again overwriting something else).
5) Any time that the processor requires data that is not stored in second level cache, or 3rd level cache, it goes into RAM, and if the data is stored in RAM that is uncached at 3rd level the data WILL be written into 2nd level cache overwriting something else. Conversely, (and worse from the standpoint of cache optimization), if the data is found in either cache, the cache contents will not change.
6) Because the Algorithyms are written in such a manner that when something has to be overwritten, the most recently accessed data is preserved in second level cache, rather than data which CANNOT be cached at 3rd level because it is residing in RAM that is uncached at 3rd level; sometimes data which cannot be cached at 3rd level winds up being overwritten and thus is no longer available in cache at all. (These overwritten data points may windup being brought into 2nd level cache and then bumped out again by being overwritten any number ot times.)
7) Over a period of time, however, what happens is that those data points which can be cached in the 3rd level cache lines get cached there, and those data points which cannot be cached in the 3rd level cache lines wind up in the 2nd level cache lines. The effects of #6 above however delay this happening. What happens is, one by one, on a least used basis, the data points which can be cached in the 3rd level cache lines get overwritten in 2nd level cache. And the next time the processor searchs RAM for that item it gets brought into 3rd level cache, leaving room in 2nd level cache for one more data point which cannot be cached at 3rd level, to reside in 2nd level cache, without being disturbed by being overwritten by some data point which has been cached at 3rd level. i.e. The more data points that get stored in the 3rd level cache lines, the less often the processor has to search RAM and bring a data point into 2nd level cache overwriting something else, thereby bumping it out of cache.
8) However, it is only when a data point, which is stored in RAM cached at 3rd level, gets overwritten in 2nd level cache, thereby allowing some other data point to reside in 2nd level cache, that the 3rd level cache optimizes to some degree, since the next time the processor needs the overwritten data point it WILL be brought into 3rd level cache, without bumping another needed data point out of 3rd level cache. Hence the process of optimization is slow.
9) Over a period of time what happens is the cache optimizes DESPITE the algorithym, (which always wants to bump the item least recently used out of 2nd level cache regardless of whether it can be cached at 3rd level or not), with those data points which cannot be cached in 3rd level cache accumulating in 2nd level cache and those data points which can be cached by 3rd level cache accumulating in 3rd level cache.
10) This results in progressively higherSandra Memory Bandwidth scores because the cache is apparently not flushed between runs.
11) That, I hope, is a clear, (though not 100% accurate), explanation of the rising results in the Sandra Memory Bandwidth benchmark that are associated, and associated only, with systems having a large amount of RAM uncached at 3rd level. Were all of that RAM cached at 3rd level the initial results would be higher; but subsequent runs would show no gain.
NOTE : Stedman did NOT have a large amount of RAM uncached at 3rd level.
Where none of the RAM is cached at 3rd level the initial results will be lower, except that apparently there is some cost to cache operation which will allow slightly higher initial results if cache is disabled; but at the cost of losing the gains associated w/ cache optimization over a period of time. (KachiWachi gets better results w/ cache disabled; but I don't think he has a large amount of RAM uncached at 3rd level, - so he has nothing to lose.)
12) The point you seem to fail to understand is that having large amounts of uncached RAM at 3rd level does not give one higher results. What it does is give you lower results than you would have got if the same amount of ram were cached at 3rd level. It is only with repeated runs gradually optimizing the cache contents that the result becomes what you would have got in the first place if all of the same amount of RAM had been cached at 3rd level.
13) As for gain shown by other programs running after Sandra, what is happening there is these programs are taking advantage of Sandra's optimization of the cache at both 2nd, and 3rd level. But running these other programs multiple times in succession after running Sandra will not show further gains, and may well show a loss; probably brought on by cache flushing.
14) Finally re "CHEAT BENCH", I think that is out of line. Real world aplications that run for an extended period of time doing repetitive operations, (and at the machine code level they most certainly do), will also show the benefit of cache optimization. It may not be readily measured, but it is there. You want to check it, run a background application that tracks cache hits. That would be the easiest way to settle this whole exceedingly stupid argument. In any case as long as a person posting results specifies exactly how he or she got them, it cannot be called cheating.
O.K. I will state it all again, perhaps more clearly this time.
1) The K6-3+ can cache all the RAM addresses a Super 7 can throw at it; but it does not have sufficient cache lines to store all of the data required by the OS and Sandra while running the Memory Bandwidth test. NOTE : If Peter argues this I want KachiWachi's opinion on the subject.
2) Where a large amount of RAM is uncached at 3rd level, a significant amount of data required by Sanda and the OS is going to windup residing in RAM that is uncached at third level. (Though it is at least address cached at 2nd level.)
3) When the CPU requires data it first searchs the tag ram to see if the contents of that RAM address is stored in one of the 2nd level cache lines. If the data is not available there, it then searchs to see if it is stored in one of the 3rd level cache lines. If the data is stored in neither cache then the processor has to search RAM. When the processor finds the data, if it is residing in RAM that is uncached at 3rd level it writes the data into a 2nd level cache line overwriting something else. (Because the second level cache is always full)
4) When the processor next needs that overwritten data, it will no longer be available in second level cache, so the processor, having fruitlessly checked 2nd level cache, checks 3rd level cache. If the data is not there, then again the processor goes back to searching RAM. If this time the data is residing in RAM that is cached at 3rd level cache, then the processor writes that data into one of the 3rd level cache lines, and 2nd level too for all I know. (In which case again overwriting something else).
5) Any time that the processor requires data that is not stored in second level cache, or 3rd level cache, it goes into RAM, and if the data is stored in RAM that is uncached at 3rd level the data WILL be written into 2nd level cache overwriting something else. Conversely, (and worse from the standpoint of cache optimization), if the data is found in either cache, the cache contents will not change.
6) Because the Algorithyms are written in such a manner that when something has to be overwritten, the most recently accessed data is preserved in second level cache, rather than data which CANNOT be cached at 3rd level because it is residing in RAM that is uncached at 3rd level; sometimes data which cannot be cached at 3rd level winds up being overwritten and thus is no longer available in cache at all. (These overwritten data points may windup being brought into 2nd level cache and then bumped out again by being overwritten any number ot times.)
7) Over a period of time, however, what happens is that those data points which can be cached in the 3rd level cache lines get cached there, and those data points which cannot be cached in the 3rd level cache lines wind up in the 2nd level cache lines. The effects of #6 above however delay this happening. What happens is, one by one, on a least used basis, the data points which can be cached in the 3rd level cache lines get overwritten in 2nd level cache. And the next time the processor searchs RAM for that item it gets brought into 3rd level cache, leaving room in 2nd level cache for one more data point which cannot be cached at 3rd level, to reside in 2nd level cache, without being disturbed by being overwritten by some data point which has been cached at 3rd level. i.e. The more data points that get stored in the 3rd level cache lines, the less often the processor has to search RAM and bring a data point into 2nd level cache overwriting something else, thereby bumping it out of cache.
8) However, it is only when a data point, which is stored in RAM cached at 3rd level, gets overwritten in 2nd level cache, thereby allowing some other data point to reside in 2nd level cache, that the 3rd level cache optimizes to some degree, since the next time the processor needs the overwritten data point it WILL be brought into 3rd level cache, without bumping another needed data point out of 3rd level cache. Hence the process of optimization is slow.
9) Over a period of time what happens is the cache optimizes DESPITE the algorithym, (which always wants to bump the item least recently used out of 2nd level cache regardless of whether it can be cached at 3rd level or not), with those data points which cannot be cached in 3rd level cache accumulating in 2nd level cache and those data points which can be cached by 3rd level cache accumulating in 3rd level cache.
10) This results in progressively higherSandra Memory Bandwidth scores because the cache is apparently not flushed between runs.
11) That, I hope, is a clear, (though not 100% accurate), explanation of the rising results in the Sandra Memory Bandwidth benchmark that are associated, and associated only, with systems having a large amount of RAM uncached at 3rd level. Were all of that RAM cached at 3rd level the initial results would be higher; but subsequent runs would show no gain.
NOTE : Stedman did NOT have a large amount of RAM uncached at 3rd level.
Where none of the RAM is cached at 3rd level the initial results will be lower, except that apparently there is some cost to cache operation which will allow slightly higher initial results if cache is disabled; but at the cost of losing the gains associated w/ cache optimization over a period of time. (KachiWachi gets better results w/ cache disabled; but I don't think he has a large amount of RAM uncached at 3rd level, - so he has nothing to lose.)
12) The point you seem to fail to understand is that having large amounts of uncached RAM at 3rd level does not give one higher results. What it does is give you lower results than you would have got if the same amount of ram were cached at 3rd level. It is only with repeated runs gradually optimizing the cache contents that the result becomes what you would have got in the first place if all of the same amount of RAM had been cached at 3rd level.
13) As for gain shown by other programs running after Sandra, what is happening there is these programs are taking advantage of Sandra's optimization of the cache at both 2nd, and 3rd level. But running these other programs multiple times in succession after running Sandra will not show further gains, and may well show a loss; probably brought on by cache flushing.
14) Finally re "CHEAT BENCH", I think that is out of line. Real world aplications that run for an extended period of time doing repetitive operations, (and at the machine code level they most certainly do), will also show the benefit of cache optimization. It may not be readily measured, but it is there. You want to check it, run a background application that tracks cache hits. That would be the easiest way to settle this whole exceedingly stupid argument. In any case as long as a person posting results specifies exactly how he or she got them, it cannot be called cheating.