In one of the previous posts, error prevention was mentioned.
While running each sub function through thousands of test cases proves very effective with most of the classes, this procedure is quite problematic on the main frame.
On one hand there are simply too many possible states that would need testing (the few thousand states of each subfunction must be combined with those of all other subfunctions). A shear brute force evaluation (testing every combination) would need ages.
On the other hand, most state combinations must be discarded in advance (a player can not act after he folded… a player can not check after another raised… etc), otherwise the brute force protocol would show millions of caught exeptions that are irrelevant (and the real problems would dissapear below that enormous amount of junk).
So debugging the main frame needs other tactics: Continuous monitoring of critical values and double check results with different calculation algorithms.
See the following examples.
- The bank system is monitoring how much money is located on each table (by regarding only what is withdrawn on buy-in and what is deposited when a player leaves a table). The dealer is monitoring the same amount (by regarding only the stack of each player and the pot). These two values are compared at each round (and must be identical).
- When no game is active (all money is located at the bank, none at the tables), the sum of all bank accounts must be 0.
- The statistics of each player (number of folds/calls/raises or wins/losses) must correspond in their sum to the number of played games (this is even calculated per round, taking in count that later rounds have lower numbers due to folds or early wins)
This is not a fail safe check as not all possible combinations are tested… but only the examples above can already detect most calculation errors handling the money andstatistical numbers… and as they are widely interconnected with the game engine, this gets checked, too. So the blind spots can be widely decreased by (1) adding monitoring programs wherever reasonable and (2) running as many games in simulation mode as possible.
As quality control is held very high at Stormcastle.eu, every time an exception flag is shown, its debugging gets prioritized over whatever new functions are currently being developed.
Since the alpha phase started in march, the list on open issues from exception flags was a constant pain in the a…, sometimes new issues came up way faster than old ones got solved. But yesterday the last known exception on this list was cleared and a test run of 500.000 games (12 player table pot limit, 120.000 games per AI) got through without a single flag coming up.
This does not mean, we are finished (there are still many functions that need to be developed for the final version and every step can create new exceptions), but getting rid of this additional weight lifts the moral and frees much energy for the next steps.
P.D: Here the ranking of Gen-1 AIs after this new test run (to be compared to the 8 player table results last week): Janus (303k), Flora (289k), Hephaistos (258k), Merkur (238k), Tyche (228k), Hades (227k), Fortuna (224k), Hermes (220k), Vulcanos (210k), Pluto (189k), Saturn (180k), Kronos (174k), Selene (167k), Luna (153k), Minerva (125k), Juno (124k), Bellona (103k), Hera (101k), Athene (97k), Nike (95k), Victoria (76k) Ares (58k), Uranus (50k), Apollo (44k), Nemesis (36k), Zeus (15k), Somnus (7k), Mars (-4k), Morpheus (6k), Eos (-16k), Aurora (-26k),
Always Fold Test AI (-30k), Jupiter (-52k), Pan (-53k), Faunus (-54k), Aphrodite (-144k), Venus (-145k), Sol (-186k), Helios (-200k), Poseidon (-227k), Neptun (-261k), Erinyen (-274k), Furiae (-289k), Eros (-337k), Cupido (-360k), Bacchus (-608k), Dionysos (-626k)