SiS530 @133MHz (GA-5SMM/Soyo 5SSM) setup, benching, tweaking

Discussion relating to Socket 7 hardware.
User avatar
Stedman5040
Veteran K6'er
Posts: 271
Joined: Mon Sep 05, 2005 4:22 pm

Post by Stedman5040 »

Getting closer with superpi 1M now down at 305 seconds with K6-III+/620 (124x5) using 4x divider. Where do I go from here? Looks like I might need another wpcredit tweak.

Stedman
Super7Dude
K6'er
Posts: 77
Joined: Mon Aug 28, 2006 8:05 pm
Location: Brisbane, Queensland, Australia

Post by Super7Dude »

I can't get mine to boot to windows at 5x124MHz even with 2.2v :(
I'm wondering if i buy another one on ebay whether i'll have better luck... :lol:
Super7Dude
K6'er
Posts: 77
Joined: Mon Aug 28, 2006 8:05 pm
Location: Brisbane, Queensland, Australia

Post by Super7Dude »

Well, I've managed to find a K6-III+/500ANZ... so i'm now eagerly waiting for it to arrive :D
User avatar
Stedman5040
Veteran K6'er
Posts: 271
Joined: Mon Sep 05, 2005 4:22 pm

Post by Stedman5040 »

Wow! I finally got the SIS530 chipset to do Superpi (1M) in less than 300secs. Had to use the Sandra memory benchmark trick.

Fired up the cpu to 600Mhz (4.5x133) then ran Sandra 2004 memory bench for four runs. Ran Superpi 1M and bingo time came in at 293 seconds. :lol:

Stunning really for a completely condemned chipset. Not only did it beat the 5 min barrier by a full 7 seconds but it also was faster than my time for the EP-MVP3G2 at 617Mhz (112x5.5) by 1 second.

I'll try to get some other benches up this weekend for your perusal.

Stedman 8)
Super7Dude
K6'er
Posts: 77
Joined: Mon Aug 28, 2006 8:05 pm
Location: Brisbane, Queensland, Australia

Post by Super7Dude »

Wow congrats! maybe that sort of "primes" or "warms up" the memory interface and then superpi can take advantage of it :)
congrats :)
Jim
K6'er Elite
Posts: 1745
Joined: Wed Jan 21, 2004 7:10 pm
Location: Toronto

Post by Jim »

Bet if you try my "trick" with the Epox it will be faster still. For those not familiar w/ this means of getting higher scores, it is not hard and fast in the way it works. It depends , (I believe), on how much RAM you have uncached at third level. The more RAM that is uncached at third level, the more often Sandra's "Memory Bandwidth" test scores will rise w/ consecutive runs.

Superpuppy-3 Had a 1 Meg cache board & 512 Meg of RAM, of which 256 Meg was cached @ 3rd level.
Sandra Memory Bandwidth scores for Superpuppy-3 rose as follows :
INT MMX : 206; 212; 215; 216 after which they started to fall off.
Float FPU : 209; 212; 214; 214 after which they started to fall off.

Everest : Before Sandra :
Memory Read = 312
Memory Write = 164
Memory Latency = 206.4

Superpuppy-2 Had a 512K cache board & 768 Meg of RAM, of which 128 Meg was cached @ 3rd level.
Sandra Memory Bandwidth scores for Superpuppy-2 rose as follows :
INT MMX : 210; 213; 229; 231; 238; 241 after which they started to fall off.
Float FPU : 210; 216; 228; 234; 237; 237 after which they started to fall off.

Everest : Before Sandra : ........ Everest AFTER Sandra :
Memory Read = 361 MB/s ....... Memory Read = 365 MB/s
Memory Write = 177 MB/s ....... Memory Write = 201 MB/s
Memory Latency = 182.8 ns .... Memory Latency = 184.3 ns

So what you have to do is run the Sandra memory bandwidth test several times in succession, noting the results, and see how many times your configuration will lead to rising results. Once you know how often the results will rise before they start to fall, then you can run the Sandra memory bandwidth test that number of times before running some other benchmark and see if it gets you better results in the other benchmark.

You may also want to try running the Sandra memory bandwidth test 1 time less than the number of times you get rising results, before running your other benchmark, so as to avoid the "falling result" due next situation, to see if that gives you still better results in the other benchmark.
Superpuppy 3
K6-3+ 450 ACZ (6x100)
DFI K6BV3+/66 Rev B2 (2 Meg) w/ 2x28mm Chipset Fans
2x256 Meg PC 133 Hynix SDRAM
1x 20G Maxtor (7200)
2x 80G Maxtor (7200) Ducted w/ 2x486 Fans Mount
52/24/52/16 LG CDR/RW/DVD
8/4/3/12/24/16/32 LG Super Multi
ATI 9000 aiw Radeon AGP
SB Audigy 1 MP3 Sound
CMD 649 IDE Controller
NEC USB 2 Card
User avatar
Stedman5040
Veteran K6'er
Posts: 271
Joined: Mon Sep 05, 2005 4:22 pm

Post by Stedman5040 »

@Jim

The very interesting thing with this set up is that I only had 192 mb of ram installed and no onboard cache enabled. So the only cache was available on the cpu itself. My Sandra scores were about 214/211 and they did not move at all during the four membench runs. Still it is very interesting that the score should be so significantly better after the sandra runs. Before Sandra the Superpi 1M run gives 310 seconds and afterwards 293 seconds. I didn't see a change in my Everest scores either. It is all a mystery to me but it got the SIS530 below 300 seconds which must be a minor miracle.

Stedman :)
Jim
K6'er Elite
Posts: 1745
Joined: Wed Jan 21, 2004 7:10 pm
Location: Toronto

Post by Jim »

Tony, that supports what I was saying about rising Sandra scores being linked to the amount of RAM uncached at 3rd level. Since you had your 3rd level cache disabled, neither your Sandra scores nor your Everest scores got better. (Same thing happens to KGB, w/ his defective cache board). The fact that your Superpi 1M result still got better, is something else. Somehow consecutive runs of the Sandra Memory Bandwidth test optimizes memory performance. I always thought that it was caused by a rising incidence of cache hits at 3rd level; but apparantly it is something more than that. Could be it affects 2nd level cache hits too; though why that would improve your Superpi results but not Sandra or Everest is a mystery to me.
Last edited by Jim on Sun Feb 04, 2007 8:55 am, edited 1 time in total.
Superpuppy 3
K6-3+ 450 ACZ (6x100)
DFI K6BV3+/66 Rev B2 (2 Meg) w/ 2x28mm Chipset Fans
2x256 Meg PC 133 Hynix SDRAM
1x 20G Maxtor (7200)
2x 80G Maxtor (7200) Ducted w/ 2x486 Fans Mount
52/24/52/16 LG CDR/RW/DVD
8/4/3/12/24/16/32 LG Super Multi
ATI 9000 aiw Radeon AGP
SB Audigy 1 MP3 Sound
CMD 649 IDE Controller
NEC USB 2 Card
DonPedro
K6'er Elite
Posts: 578
Joined: Wed Jul 27, 2005 2:11 pm

Post by DonPedro »

@stedman and super7dude,

had almost no time this week to participate your vivid discussion. just sneaked by here and then to follow your posts. you really produced some wonderfull numbers! :) everest read beyond 400!!! who would have expected that from the sis530 chipset? I just wonder why super7dude on one hand finally got even better numbers in everest (417!!) and also managed to move the hotcpu-memory benches beyond 200 when compared to your results, stedman, but on the other hand still lags behind in superpi 1m ..... what could the reason be for this?

re low density 512mb ram modules: I just checked and it took a while to find a place where they sell these babies. seem to be very rare. the price for the stick I found is 72 euros for the cl2 type :(

@jim
I do not see why a ram bench should show higher numbers by repeated runs on the grounds of "The more RAM that is uncached at third level, ....". I hold two considerations against this view: if a k6-3 or any "+" model is used, then the cpu is by its own means capable to cache 4gb of ram, independently from what the mainboard could do in this respect. the second thought is that the strategy for implementing a cache-subsystem is to reduce the number of accesses to (much slower) main memory. this works when the memory address accessed is within the cache's capabilities and the value of the memory at the respective address has not changed since the last access. so repeated runs of the very same application should give better results when the ram is cached, not uncached. it would lack a good explanation if it happens the other way round. I would suggest that the approach the programmers of sandra have taken to measure memory speed is to give the blame. btw, which version of sandra did you use? I will try it on another system (intel celeron 766 on intel zx-chipset board) to see what happens there. I don't know if this will give some insight, because this board has no mb-cache at all, but lets see.


back to new results super7dudes has sent me, this time the chart is about

tweaking the system with wpcredit

chart below shows the results super7dude got with wpcredit tweaks applied (except keeping memory timings at 3-3-3- 8 ) and the comparison with the untweaked setup (K6-3+600, 133/133, mb-cache disabled, gf2-mx).

what we see raises a question: how well does a (tremendous) performance gain scored in everest translate into something useful. both synthetic benchmarks (everest, hot cpu tester) show that the tweaks work. from hot cpu testers semi-synthetic benches only large objects sort show a gain similar to what we`ve just seen before. the situation is quite different in osmark, with the exception of richedit and pngout, where we see +8% better performance. a handfull of some other benches score +3 to +5% better. but the surprise comes with graphics related benches. they not only do not gain in performance, they loose and they loose by big margins of between 15 and 23%.

next post will be the results of the same setup but with 2-3-2-5 memory timings.
Attachments
s7dude k6-3-600@133 wpcredit tweaks on-off.png
s7dude k6-3-600@133 wpcredit tweaks on-off.png (22.16 KiB) Viewed 19910 times
Jim
K6'er Elite
Posts: 1745
Joined: Wed Jan 21, 2004 7:10 pm
Location: Toronto

Post by Jim »

Peter, I think you misunderstand how cache works, though maybe it's me who doesn't understand. If all the data in RAM were completely cached you would not get "Rising Results". Instead what you would get is the best possible result, (given the equipment you are using), on the first run; and every run thereafter. The results would be perfectly flat, unvarying, (at least as far as cache is concerned). In point of fact the results would be similar to disabling your third level cache, or having none, not that that will give you the best possible result, because in most cases it won't .

The reason I say that the Sandra results rise in proportion to the amount of Ram uncached at 3rd level is this. The 2nd level cache may have a large enough tag ram to be able to retain all the addresses of all the ram installed; but it does not have enough cache lines to store more than a tiny fraction of the contents of all the ram installed.

3rd level cache is accessed when the CPU is unable to find what data it is looking for in 2nd level cache. When the CPU is unable to find what it is looking for in either second level or third level cache then the CPU has to go hunting in RAM. This results in the required data winding up in cache.

If the data was in RAM that is uncached at 3rd level, it winds up being cached in 2nd level cache where on subsequent searches the processor can find the required data more quickly.

I think what is happening in the Sandra Memory Bandwidth test, is the K6-3+ processors simply do not have enough cache lines to store all the data that the OS and Sandra use. The CPU then has to go to third level cache, (which is larger, [i.e. more lines], in most cases), and look for the data there. If it gets a hit fine, if not then its on to ram. But each time it goes to ram, the data winds up in cache.

The more Ram that is uncached at 3rd level the more of the data that the OS and Sandra require that will initially be uncached. So what you get is more and more of the required data from uncached RAM at 3rd level accumulating in 2nd level cache overwriting other data. If the over written data was stored in RAM which is cached at 3rd level, that data when reaccessed accumulates in 3rd level cache. So what you get, I think, is a shuffling of data sorting itself out in accordance with whether or not it was initially stored in RAM cached at 3rd level with the result being higher and higher benchmarks. I think that if all the RAM were cached at 3rd level, the initial results would be higher; but they would also be flat on subsequent runs. Nowhere to go, because everything required would be in cache after one run.

EDIT : Logically the more RAM that is uncached at 3rd level, the longer it will take to shuffle out which items wind up being cached at third level and which get cached at second level. Additionally, I would say, that the only possible way that I can see, that running Sandra's Memory Bandwidth benchmark program a number of times, could influence a different benchmark program that followed, is by concentrating those elements required to run the second benchmark in cache before the program starts. The part that I don't understand, is why you get rising results for x number of runs, and then the results fall off. You would think that they would flatten out.

You should read the article at the link KachiWachi posted on cache operation.
http://www.pcguide.com/ref/mbsys/cache/func.htm

And finally I use Sandra 2004.

One other thing of a more general nature with respect of this thread and others: Question Gentlemen : If you are going to post benchmark results, why do you not post the tweaks used to obtain said results? Is this a competition wherein everyone keeps their methods secret? Or is it a co-operative effort wherein people help one another? I have noticed that very few people here post the methods they use to attain their posted results. That is very disapointing to me. I share my methods and Stedman shares his, but I would like to see others doing that too.
Last edited by Jim on Sun Feb 04, 2007 9:13 am, edited 4 times in total.
Superpuppy 3
K6-3+ 450 ACZ (6x100)
DFI K6BV3+/66 Rev B2 (2 Meg) w/ 2x28mm Chipset Fans
2x256 Meg PC 133 Hynix SDRAM
1x 20G Maxtor (7200)
2x 80G Maxtor (7200) Ducted w/ 2x486 Fans Mount
52/24/52/16 LG CDR/RW/DVD
8/4/3/12/24/16/32 LG Super Multi
ATI 9000 aiw Radeon AGP
SB Audigy 1 MP3 Sound
CMD 649 IDE Controller
NEC USB 2 Card
User avatar
Stedman5040
Veteran K6'er
Posts: 271
Joined: Mon Sep 05, 2005 4:22 pm

Post by Stedman5040 »

I use sandra2004 as well. the tweaks used for wpcredit are documented earlier in the thread, but I will post them once again when I get all my data together for the results.

Stedman
DonPedro
K6'er Elite
Posts: 578
Joined: Wed Jul 27, 2005 2:11 pm

Post by DonPedro »

@jim,
I think I understand very well what is caching about. many many years ago I undertook the task to program a disk-caching program from scratch (an alternative to smartdrv.exe). although there are of course some differences when compared to cacheing memory the main concepts and rules on what to do are the same. that included to provide a kind of software tag-ram mechanism, which is done in hardware by the chipset or cpu for caching ram. nevertheless I went to the site you addressed and all I can say is that what I have written is 100% in sync with what can be read over there.

when I read your post and compare it with mine I would say that we are quite close about about what is cacheing about. what I am not sure about is whether you really discern between a) being able to overlook the whole amount of installed ram and b) actually holding the data. it is cristallclear that a cache's size is only a fraction of main memory, that is the concept! if it wasn't we should spare main memory and use 512mb of cache instead.

what has the author of the article you linked to has to say after all the explanations he gave on what is MAINBOARD cacheing about, many pages deep within:

"Pentium Pro PCs use an integrated level 2 cache that contains the tag RAM within it, SO NONE OF THIS IS REALLY A CONCERN FOR THESE MACHINES. The Pentium Pro will cache up to 4 GB of main memory, basically anything you can throw at it."

the same is valid for K6-3 and "+" processors. they do have enough cache lines, their internal tag-ram bit-width is big enough.

but let us get back to the main issue (mystery) that sandra shows higher results when run multiple times. if sandra was a perfect tool it should not display this effect, it should be programmed to give an accurate result after one run (within some margin of error +/-2%).
it should avoid cacheing effects that are not within the limits it wants to show. these limits end when the benchmark is over. a second run - if properly carried out - should avoid probable cacheing effects that survive from one to another run. but of course to some degree it is beyond a programs capabilities to disable cacheing effects. there is very good reason - namely cacheing! - that a second run gives better results. because - as you will agree - the very first run was without "positive" cacheing effects (except the effects that were duely to show because the program behaves that way that while running it accesses multiple times the same address locations), caches were filled for the first time. so the second run can argueably show higher results because caches still are fed with the data from the first run. but since sandra shows even better runs with every consecutive run (up to some point) something smells foulish.

the first run leaves the cache-system in a state the second run may eventually profit from, because its algorhytm by chance can take advantage of what is already there. why should any additional run put the cache-system in an even more favourable condition for the next run? from a cache-technology and -implementation point of view there is no reason for this! what can the second, third, fourth etc run "add" to the cache what was not there before? if that would be the case then we really would have a problem here. that would mean that the cacheing implentation did not do its job correctly during the run before.

but since we obivously have the fact that sandra shows this strange behaviour I assume 2 possible reasons, none of which has to deal with cacheing: a) sandra's membench is unreliable because of bad implementation, and/or b) we are dealing with some other still unknown "defect".

since we know that some other benchmarks (like everest) show better results because sandra was run many times just before, we can surely conclude that "cacheing" effects can not be made responsible for that. because of that observation I would go for the b) option.

option b) is my favourite. I played around a lot with my p5a. 1.06 board quite a while ago. it is well known fact that the 1.06 revision has a lot of troubles with k6-+ cpus. I did not manage to overcome this problems but I found out something that was similar in effect to what we have here. the board performed great with a k6-3-400. as soon as I plugged in a k6-3-450 my everest write scores went down extremely. why I don't know but in the end I found out that if I kept the cpu-temperature at above 38° celsius my write scores were up and as high as expected! no joke! for keeping the temperature at that level I used the seti-client to run in the background, which puts quite some load on the cpu and the memory-cache-system.

as stedman has already pointed out - all usefull tweaks found so far have been posted, everybody is sharing knowledge. some go even that far that they openly admit "cheating". you are such a baaad trickster, stedman! :)
User avatar
Stedman5040
Veteran K6'er
Posts: 271
Joined: Mon Sep 05, 2005 4:22 pm

Post by Stedman5040 »

Well,

I had to get some raw data on this issue and have spent some time today benching superpi 1M with sandra2004 on the GA-5SMM board with the attached K6-III+ cpu. As promised the tweaks are as follows with wpcredit.

With onboard cache enabled

register 50 to (F0)
register 51 to (9A)
register 52 to (50)
register 55 to (0B)

with onboard cache disabled as above but

Register 51 is set at (1A)

The set up is as follows

K6-III+ cpu
Gigabyte GA-5SMM
10Gb ATA33 maxtor HDD
Matrox mystique PCI 4mb card
Voodoo2 accelerator
192mb PC133 dram (1x128mb and 1x64Mb) (2-3-2-6)
Intel 100+ nic

The method used was as follows for each run

1. Power up into windows
2.Set wpcredit tweaks
3.Run superpi 1M
4. Run Sandra2004 3x
5 run superpi 1M
6.power down

Results as follows

cpu setting---------------b4 sandra-----Sandramemmark----after sandra

5.5x100 with cache--------397s--------------152/152------------344s
5.5x105 with cache--------375s--------------162/160------------331s
6.0x100 with cache--------389s--------------153/151------------338s

5.5x100 no cache----------391s--------------157/153-----------351s
5.5x105 no cache----------379s--------------165/162-----------333s
6.0x100 no cache----------415s--------------151/150-----------343s

Every time Sandra is run we get a better result with superpi. Usually in the order of 10 to 15% at these speeds and fsb. The Sandra scores for all of the above fell by about one or two points during the 3x runs rather than increase, but it still results in a better run time for superpi. Must try it next with a K6-2/550 with no cpu L2 cache and maybe with a Pentium 200MMX. I can also try it out on the VA6 I have with a Celeron and PIII. Would be worth checking out on the P5A board I have as that also only has 512k of on board cache.

Stedman :lol:
lázy_kalabok

Post by lázy_kalabok »

try running microsoft word before the 1M bench - maybe it gets better as well? :)
DonPedro
K6'er Elite
Posts: 578
Joined: Wed Jul 27, 2005 2:11 pm

Post by DonPedro »

@stedman and others too

thank you for digging into the sandra-bench-cheat-mystery.

because of the many systematic runs we see a lot of information here.

1. the mystery is not cache-related
2. sandra results do not necessarily need to show rising results to make the mystery work.
3. superpi likes cache
4. sandra does not like cache when 5.5 multiplier is used
5. the gain in speed is at least to me unexpectedly high: from 40 seconds up to 72 seconds!
6. houston we have a problem! what is it that makes it happen?

since this sandra-mystery but also the 5.5x multiplier effect is of public interest and deserves broader attention may I humbly ask that somebody opens a new thread on this? also it is some kind of ot and does not fit very well here.

I have added a little more information to the numbers that makes evaluating the data more easy.
Attachments
sandra bench mystery.png
sandra bench mystery.png (3.62 KiB) Viewed 19829 times
Post Reply