Monday 10 August 2020

(2020) Learning Java - Sudoku Puzzle Solver

Retirement and Pandemic Isolation

The year 2020 will go down in history along with the Great Plague of London (1665-1666) and the Spanish Flu (1917-20). This was the year that the Corona Virus COVID-19, spread around the world in just a month or two. Australia was lucky with no land borders. All international arrivals came through controlled air and sea ports. So very early on, all international arrivals were placed in quarantine hotels for 14 days. 

But like all human systems, a couple of breaches occurred. A cruise ship, the Ruby Princess, sailed from Sydney, around New Zealand ports and back to Sydney. So Australian passengers who simply sailed the loop, might possibly not been deemed "international arrivals". But it just needed one infected waiter on board. With a number of failures in on-board medical reporting and discharge checking, all passengers were let off without quarantining.

In Melbourne, a break-down in quarantine management (untrained, poorly supervised guards), including a guard having sex with an infected quarantainee, lead to a rapid spread into community transmission, and over a couple of months, 700+ new cases per day and up to 20 deaths per day.

A state of disaster was declared and stage 4 stay-at-home restrictions and curfews were implemented.

So it is in this environment that I found myself in isolation looking for things to keep me occupied. The retirement village management circulated a fortnightly news sheet with pages of puzzles, including Sudoku. Now I have never been sufficiently patient to do Sudoku in the past, but with isolation, I decided to have a go. I soon developed a method to solve the very easy puzzles, but more complicated examples bored me. But the IT analyst in me was challenged as to how these puzzles could be solved by a program.

The Challenge of Learning Java

Now I wrote my first program in 1966, then more comprehensive programming of numerical methods at university in 1968, including a small job of data analysis of a geography research student's field data. In 1969 I studied Computer Science for my BSc, then post graduate research into database management. 

My first programming job was on a Hospital Biochemistry Department's Data Acquisition system. I haven't coded professionally since the mid 1990s, having moved on to application architecture, technical writing, quality assurance and project management. In data warehouse work, I used set oriented SQL 'programming' extensively, and personally in family history research, I taught myself web page programming (HTML, CSS, Javascript). 

Although I had use 'Simula' (an object oriented extension to 'Algol') back in the early 1970s, I have never kept up with any of the modern object oriented languages. So I set myself the goal of learning "Java", and a Sudoku solving program seemed an ideal choice to cut my teeth on this modern language, with sufficient intellectual stimulation.

Sudoku - The Target Problem

A Sudoku puzzle comprises a 9x9 grid, further subdivided into 9 sub-matrices of 3x3 cells. The goal is to place the digits 1..9 uniquely in each row, column and sub-matrix. A certain number of cells will be prefilled with digits that provide the clues in deducing what digits can be validly put in each empty cell. The digit in any cell, cannot occur in any other cell of the row, column or sub-matrix intersecting at the subject cell.

(An example "Hard" Sudoku Puzzle)

Sudoku Solution Techniques

Andrew Stuart's "Sudoku Wiki" has a most comprehensive summary of some 25 solution techniques and a range of solvers. He has also written a book, "The Logic of Sudoku". Now I only discovered this web site after I had coded my initial program - I found that I had implemented one of the simplest techniques from Andrew's list, Hidden (including Naked) Singles. I call these techniques "deductive" in that there is a logical process to deduce solution digits from the given clues and solved-to-date digits.

Now I wasn't really up to trying to work out how to code any of his more complex techniques, so I resorted to the "brute force" technique of "last choice".

In my deductive method, I build a set of data structures that:-

  • list all the candidate digits possible at a cell;
  • count the number of occurrences of each candidate digit in the cells of each row, column and sub-matrix.

As each solution or trial digit is placed in a cell, all the related data structures are updated, reducing the number of empty cells and number of candidates.

So whilst a "brute force" method recursively tries the digits 1..9 in every empty cell and checks to see whether that placement is safe/valid, I saw that I already had a list of valid candidates for every empty cell. So my modified recursive search method only iterates through the known valid candidates at an empty cell. Then, rather than recursing directly to the next empty cell, I could see that with a new digit inserted, the deductive method should be executed to find any other digits that can be inserted by deduction. This eliminates more empty cells, and reduces the number of candidates valid at the remaining empty cells. So when the deductive pass completes and control returns to the recursive search, the size of the recursive search space has in fact been reduced. In a diabolically difficult puzzle of say 43 empty cells, if on average each empty cell has 5 valid candidates, then a pure "brute force" method performs 5**43 safe solution tests.  My modified method of pruning empty cells and candidates progressively, reduces the recursion to something like 13 levels instead of 43 levels, and the number of candidates at each level also reduces.

Java Specific Coding Techniques

Another technique that I found in another language, and ported to Java, uses "bit masks". The bits of an integer are set to represent a digit - bit #0 through #8 represent the digits 1..9. There are three passes:-

  1. One bit mask integer for each row, column and sub-matrix, indicates which digits exist in the clues, irrespective of a digits position in the row, column or sub-matrix;
  2. Logical "Complementing" (change every '0' to '1' and vice versa) the above bit masks, indicates which digits do NOT exit in each row, column or sub-matrix. 
  3. Performing a logical AND of the three candidate bit masks intersecting at a cell, gives the list of valid candidates for that cell. (In the logical AND, for each bit position, the output bit is set ONLY if the same bit is set in ALL input masks).

If a cell has only a single candidate (a Naked Single) then that is in the solution - remember I keep a count of the number of occurrences of each candidate digit in each row/column/sub-matrix. 

If ANY of the candidate digits in a cell is the ONLY occurrence of the candidate in the candidates digits of a row, column or sub-matrix, that that digit (a Hidden Single) is in the solution.

In Java, the standard integer is 32 bits, so I immediately saw the potential for larger puzzle sizes (see blow).

Puzzle Expansion and Parameterization

For initial testing of my code, I coded a very simple 4x4 puzzle, using just the digits 1..4. I parameterized this puzzle size in my code. This got me thinking about what other size puzzle are possible. 

A 16x16 puzzle is possible, but no immediately simple selection of 'digits'/tokens is obvious. 16 is one too many for use of Hexadecimal 'digits' 1..F. Alternatively, A..P could be used. 

A 25x25 puzzle seems more intuitive, using the tokens A..Y (thus I gave this the name Nozeku - 'No-Zee-Ku'). I parameterized the sets of valid token characters which map to bit-matrix index digits 0..n. With a list of valid token characters, possibly alphabetic, I convert all input characters to upper-case. Any non-valid input token character can then be used to flag empty cells - I substitute them with '_' (underscore) as the standard internal empty cell representation and in the output solution.

 Where to Now?

After reading Andrew Stuart's web-site, I think my next extensions will include:

  • collection of some processing statistics that might help toward difficulty grading;
  • with my optimized recursive search method, the next step will be extend it to find ALL solutions (if more than one);
  • then I need to looking into coding a program to create random puzzles, especially of "Nozeku" type;

Having now tried a number of different Sudoku playing apps, I think I might extend my Java learning to cover GUI programming. What I have in mind is a Sudoku app version with options to show hints from various solution techniques. The idea is that it will then help users learn the more complex techniques.