Ormulogun: opening book generation & pretty graphs

Ormulogun now has an opening book built into the board view. Give it a try (use "Open book" in the tools menu). I didn't find an opening book that wasn't potentially covered by the GPL (and I don't want to re-license Ormulogun under the GPL), so I had to generate my own from a list of games (general consensus seems to be that chess games and positions arent subject to copyright).

Pretty graphs

Because an opening book is essentially a huge graph of chess positions, I tried using Graphviz to see which openings appear most often, and the transpositional power of different opening "families". I am quite pleased with the results. (The SVG files below are interactive: click on a node to open the board with the position; hover on an edge to see the percentages.)

The gen-book-graph tool was used to generate the .dot files.

Heavy pruning (positions with >1% frequency)

Medium pruning (positions with >0.1% frequency)

No pruning (all positions)

Book generation

Generating an opening book is fairly easy: play through the game, record the position after each move, and keep track of whether that move led to a win for white, a draw, or a win for black. After you've played through many games, filter out statistically insignificant positions from the position list, and you have an opening book. Doing it quickly was an interesting challenge. Even more interesting when the whole dataset doesn't fit in memory.

Generating position lists

The first step is to split the set of games into reasonably-sized "chunks". Each chunk will be fed to a worker, which will play through games and record positions in a Judy array. (Why Judy arrays? Because they're fast, memory-efficient, and give you sorted traversal "for free", which is important for merging later.) The chunks need not be too large, or else the set of all reached positions will become larger than the system's available memory. A rough estimate of memory usage is to multiply the size of the fed PGN by 5 (ie 100 MiB of PGN games will require about 500 MiB of memory to fit all the positions).

The tool that does this first step is gen-book (and gen-eco). They link against Gumble for playing through games and just spit out the sorted list of positions, with the corresponding number of wins/draws/losses, through standard output. The tool is reasonably fast, around 400 games per second per core on my machine:

% time zstdcat input.pgn.zst | pv | chunk-pgn | head -n 100000 | gen-book >/dev/null
90.4MiB 0:00:19 [4.65MiB/s]
gen-book 24.62s user 0.41s system 98% cpu 25.324 total

An earlier version used a PHP script, that used proc_open() to communicate with Gumble. The C version is about 10 times faster, as there is no IPC involved.

A real world example:

% time zstdcat lichess_db_standard_rated_2018-01.pgn.zst | pv | chunk-pgn | parallel --block-size 100M --pipe --files --compress --compress-program pzstd --tmpdir ./tmp.ZrGK4YG1wP gen-eco >/dev/null
35.6GiB 0:07:40 [79.2MiB/s]
parallel 4139.77s user 381.71s system 961% cpu 7:50.49 total

Having less than 1600% CPU usage, and an average speed around 80 MiB/s seems to indicate that parallel --pipe has become the bottleneck (the manual says it can only do ~100 MiB/s or so). Using --pipepart is not possible here as the input is not a seekable file.

Merging position lists

Now that each worker has worked through its chunk of games, we need to merge all their outputs into one huge list of positions, with win/draw/loss counts that are the sum of all the chunks. This is where we rely on the sortedness of the position list: it's really easy to merge two (or more) sorted lists in O(1) memory. The tool merge-books does just that.

Because position lists are huge files of mostly redundant data, they are stored on disk compressed with pzstd, which gives great compression ratio considering its speed (at its default compression level of 3):

% wc -c chunk.tsv.zst <(zstdcat chunk.tsv.zst)
83078231 chunk.tsv.zst
565000968 /proc/self/fd/11

That's a compression ratio of about 7. Because tools need to read the uncompressed data, and because we don't want to waste time and disk space with temporary files, the commands make heavy use of process substitution. Decompression cost is fairly cheap here, and will not be the bottleneck, as zstdcat can achieve ~500 MiB/s decompression speed per core no matter what the compression level is. To make the commands simpler, I wrote a simple zstdize tool which transparently substitutes compressed files in a command:

# This command
zstdize foo bar -- baz1.zst baz2.zst baz3.zst
# will execute:
foo bar <(zstdcat baz1.zst) <(zstdcat baz2.zst) <(zstdcat baz3.zst)

Let's look at the performance of merge-books for merging 8 compressed chunks:

% time zstdize merge-books 0 -- chunk{0..7}.tsv.zst | pv -l >/dev/null
59.6M 0:01:02 [ 949k/s]

About 1 million positions merged per second, which is great but merge-books is only maxing out one core; the merging step has become the bottleneck of the whole process. Can we make it use more than one core? Yes, with even more process substitution and nesting invocations of merge-books:

% time merge-books 0 <(zstdize merge-books 0 -- chunk{0..3}.tsv.zst) <(zstdize merge-books 0 -- chunk{4..7}.tsv.zst) | pv -l >/dev/null
59.6M 0:00:58 [1.02M/s]

Ok, a very modest gain here. Let's look at a real world example, with a larger number of chunks. The merge-books-mt tool automates generating a command like this, nesting multiple times if necessary.

% ls chunks/ | wc -l
374

% time zstdize merge-books 0 -- chunks/* | pv -l >/dev/null
686M 1:55:02 [99.4k/s]

% time zstdize merge-books-mt 0 -- chunks/* | pv -l >/dev/null
686M 0:15:47 [ 723k/s]

On real-world data the speed up is a lot more noticeable, the nesting trick speeds things up by a factor of 7. Between 2 hours and 16 minutes, the difference is very noticeable. Still, this way of achieving parallelization isn't perfect, as the main merging process (at the root of the nesting tree) is still the bottleneck.

Results

The opening book is a TSV file (tab separated values) that looks like:

...
12021   9548    11951   rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -
533     424     616     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    b1c3
543     411     465     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    b2b3
829     806     850     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    c2c3
315     317     360     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    c2c4
45      39      45      rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    d1e2
603     418     635     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    d2d3
8443    6515    8376    rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    d2d4
646     571     465     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq -    g2g3
177     132     139     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPPQPPP/RNB1KB1R b KQkq -
161     125     120     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5N2/PPPPQPPP/RNB1KB1R b KQkq -    b8c6
648     572     467     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5NP1/PPPP1P1P/RNBQKB1R b KQkq -
426     388     344     rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5NP1/PPPP1P1P/RNBQKB1R b KQkq -   b8c6
99      106     56      rnbqkbnr/pp1p1ppp/4p3/2p5/4P3/5NP1/PPPP1P1P/RNBQKB1R b KQkq -   d7d5
...

The first field is the total number of wins for white, then the number of draws, then the number of wins for black for the position that follows. The last field, if present, is the move that has been played in this position.

The ECO book is a TSV file that looks like:

A00     Amar Gambit             rn1qkbnr/ppp2ppp/8/3p4/5p2/6PB/PPPPP2P/RNBQK2R w
A00     Amar Opening: Paris Gambit              rnbqkbnr/ppp2ppp/8/3pp3/5P2/6PN/PPPPP2P/RNBQKB1R b
A00     Anderssen Opening               rnbqkbnr/pppppppp/8/8/8/P7/1PPPPPPP/RNBQKBNR b
A00     Anderssen Opening: Polish Gambit                rnbqkbnr/1ppppppp/8/p7/1P6/P7/2PPPPPP/RNBQKBNR b
A00     Barnes Opening: Fool's Mate             rnb1kbnr/pppp1ppp/8/4p3/6Pq/5P2/PPPPP2P/RNBQKBNR w
A00     Barnes Opening: Gedult Gambit #2                rnbqkbnr/ppppp1pp/8/8/4p3/2N2P2/PPPP2PP/R1BQKBNR b
A00     Barnes Opening: Walkerling              rnbqkb1r/pppp1ppp/5n2/4p3/2B1P3/5P2/PPPP2PP/RNBQK1NR b
A00     Clemenz Opening         rnbqkbnr/pppppppp/8/8/8/7P/PPPPPPP1/RNBQKBNR b
A00     Clemenz Opening: Spike Lee Gambit               rnbqkbnr/ppppppp1/8/7p/6P1/7P/PPPPPP2/RNBQKBNR b
A00     Crab Opening            rnbqkbnr/pppp1ppp/8/4p3/P6P/8/1PPPPPP1/RNBQKBNR b
A00     Creepy Crawly Formation: Classical Defense              rnbqkbnr/ppp2ppp/8/3pp3/8/P6P/1PPPPPP1/RNBQKBNR w
A00     Formation: Hippopotamus Attack          r1bq1rk1/ppp2ppp/2nb1n2/3pp3/8/PPPPPPP1/7P/RNBQKBNR b
A00     Gedult's Opening                rnbqkbnr/pppppppp/8/8/8/5P2/PPPPP1PP/RNBQKBNR b
A00     Global Opening          rnbqkbnr/pppp1ppp/8/4p3/8/P6P/1PPPPPP1/RNBQKBNR b
...

The format is self-explanatory. These files can be found in the ormulogun-ext repository.