StormChart (Call)

As promised here a preview on our third an most valuable statistics.
(Fold vs Call… Call vs Raise will come in the next post)

It shows if a player is tight or loose or if he is holding strictly to a strategy (forseeable actions) or tries to disguise his hands…

The graphics are not yet very nice as we just finished the back end programming, but they serve the purpose.

Call StormCharts

To test these Charts, each AI had to play over 500.000 games (12 player table like on July 3rd). Slight adjustments were done to them so the overall ranking from now on is different than the previous ones. The many games from this simulation show clearly that the AI belong to different groups of strength levels:
Hephaistos (775k), Vulcanos (762k), Fortuna (751k), Hades (743k), Pluto (737k), Tyche (719k) — Flora (518k), Saturn (507k), Kronos (488k), Janus (485k), Ares (400k), Mars (332k), Nemesis (316k), Victoria (317k), Merkur (314k), Bellona (313k), Hermes (310k), Hera (274k), Nike (273k), Juno (237k), Luna (229k), Selene (192k), Zeus (119k), Uranus (90k), Athene (66k), Jupiter (57k), Minerva (56k), Apollo (38k), Aurora (-25k), Somnus (-33k), Eos (-40k), Morpheus (-53k) — Always Fold Test AI (-123k) — Faunus (-254k), Pan (-298k), Aphrodite (-306k), Venus (-307k), Neptun (-414k), Poseidon (-428k), Furiae (-571k), Erinyen (-628k), Helios (-646k), Sol (-682k) — Eros (-915k), Cupido (-928k) — Dionysos (-1650k), Bacchus (-1720k)

Debugging the Main Frame

In one of the previous posts, error prevention was mentioned.

While running each sub function through thousands of test cases proves very effective with most of the classes, this procedure is quite problematic on the main frame.
On one hand there are simply too many possible states that would need testing (the few thousand states of each subfunction must be combined with those of all other subfunctions). A shear brute force evaluation (testing every combination) would need ages.
On the other hand, most state combinations must be discarded in advance (a player can not act after he folded… a player can not check after another raised… etc), otherwise the brute force protocol would show millions of caught exeptions that are irrelevant (and the real problems would dissapear below that enormous amount of junk).

So debugging the main frame needs other tactics: Continuous monitoring of critical values and double check results with different calculation algorithms.
See the following examples.

  • The bank system is monitoring how much money is located on each table (by regarding only what is withdrawn on buy-in and what is deposited when a player leaves a table). The dealer is monitoring the same amount (by regarding only the stack of each player and the pot). These two values are compared at each round (and must be identical).
  • When no game is active (all money is located at the bank, none at the tables), the sum of all bank accounts must be 0.
  • The statistics of each player (number of folds/calls/raises or wins/losses) must correspond in their sum to the number of played games (this is even calculated per round, taking in count that later rounds have lower numbers due to folds or early wins)

This is not a fail safe check as not all possible combinations are tested… but only the examples above can already detect most calculation errors handling the money andstatistical numbers… and as they are widely interconnected with the game engine, this gets checked, too. So the blind spots can be widely decreased by (1) adding monitoring programs wherever reasonable and (2) running as many games in simulation mode as possible.

As quality control is held very high at, every time an exception flag is shown, its debugging gets prioritized over whatever new functions are currently being developed.
Since the alpha phase started in march, the list on open issues from exception flags was a constant pain in the a…, sometimes new issues came up way faster than old ones got solved. But yesterday the last known exception on this list was cleared and a test run of 500.000 games (12 player table pot limit, 120.000 games per AI) got through without a single flag coming up.
This does not mean, we are finished (there are still many functions that need to be developed for the final version and every step can create new exceptions), but getting rid of this additional weight lifts the moral and frees much energy for the next steps.


P.D: Here the ranking of Gen-1 AIs after this new test run (to be compared to the 8 player table results last week): Janus (303k), Flora (289k), Hephaistos (258k), Merkur (238k), Tyche (228k), Hades (227k), Fortuna (224k), Hermes (220k), Vulcanos (210k), Pluto (189k), Saturn (180k), Kronos (174k), Selene (167k), Luna (153k), Minerva (125k), Juno (124k), Bellona (103k), Hera (101k), Athene (97k), Nike (95k), Victoria (76k) Ares (58k), Uranus (50k), Apollo (44k), Nemesis (36k), Zeus (15k), Somnus (7k), Mars (-4k), Morpheus (6k), Eos (-16k), Aurora (-26k), Always Fold Test AI (-30k), Jupiter (-52k), Pan (-53k), Faunus (-54k), Aphrodite (-144k), Venus (-145k), Sol (-186k), Helios (-200k), Poseidon (-227k), Neptun (-261k), Erinyen (-274k), Furiae (-289k), Eros (-337k), Cupido (-360k), Bacchus (-608k), Dionysos (-626k)

Write your own AI

The poker trainer app was initially written as analysis tool for poker statistics and theories.
It takes very long to test a theory in real life, as many games are required to overcome the statistical variance. But if the theoretic basis is coded into an AI, a simulation can run this against many other AIs (parallel or older versions of the same theory as well as other theories) through several millions of games so that a difference in the outcome becomes significantly visible.
Winning AIs will be used for further improvements, loosing AIs will be discarded.

The following template is the base interface for AIs to run on our poker trainer platform.

AI_Template v01

Preventing Programming Errors

One of the most time consuming tasks in the developement of our poker app is quality control…

– commenting every segment (to prevent misunderstandings on updates)
– long code review sessions
– equip each function with detailed data monitoring code (e.g. each interface)
– writing thousands of test cases
– test runs, test runs and more test runs

This is especially important for poker programs, as every bug in the AI will result in exploitable behaviour or other problems that spoil the fun of the game.
Numerous bugs in every single of the available offline-poker-apps (!) was the reason that started this project, so special care is taken not to make the same mistakes.

Quality Control

Analyse Hand Value

To start a poker program, a function is required that converts any number of cards (in texas hold-em 5 to 7 cards) into a hand value with a defined order.
This usually is represented by an integer where a higher value means a better hand (in showdown a simple compare is done, the highest value wins).

The attached hand ranking function might not be the fastest way to do so (the poker app does not have to check millions of hands per second as online poker bots do), but it does its job and is easy to understand for debugging and teaching purposes.
In case you think your code is better, feel free to email us or post it in the comments, open for discussion.

Hand Ranking