My guiding principles, both separately and in conjunction, present some interesting challenges. Here are the optimization puzzles I'm currently working on.
Minimizing Clock Per Instruction
Modifying a value in RAM involves moving the data from RAM to the Data Register, in/decrementing the Data Register, and moving the data from the Data Register back to RAM. Since Brainfuck programs usually have several +/- in a row, the data only needs to be transferred at the beginning and end of a run.
To state more clearly, you only need to write to RAM when the Data Register has been incremented/decremented and the memory location is about to change. Similarly, you only need to read from RAM if the memory location has changed since the previous read. The story gets a bit more complicated with the I/O instructions, but that's for another day.
As an alternative to the conditional read/write, I've been toying with the idea of putting the read/writes and inc/decs out of phase. So the RAM would be read/written on rising clock, and inc/dec would happen on falling clock. Coincidentally (or more likely by foresight of the 7400 series designers), the control signals to the RAM and counters already follow this logic. I haven't followed this line of thinking very far though so I'm not sure where it will lead.
Zeroing RAM
When a Brainfuck program begins, all memory locations have to start at 0. One option would be to keep a "Clear Pointer" of the highest accessed address. Whenever the Data Pointer goes up, if it is higher than the Clear Pointer, write a 0 to that location. With some clever micro-instruction logic, this would only take 2 extra cycles the first time you access a RAM location.
A simpler (and somewhat less fun) option would be to have the reset circuitry walk through RAM with a fast clock and set it all to 0 before running the program. I feel like this better fits the "Minimal number of execution steps per instruction" principle, and more directly reflects the execution of an interpreted/complied Brainfuck program on a regular computer.
Fixing a Major Brainfuck
If you've been paying really close attention so far and, and if you have an intimate working knowledge of Brainfuck, you may have noticed some hand waiving in my early posts about constant time looping. My logisim implementation had a 'uge mistake in its implementation of the Brainfuck language, and I didn't notice until long after I'd started on the breadboard project.
The loop construct, [ ... ], is represented in C as while (*data_pointer) { ... }. This means that as the loop is entered, the current RAM value is read. If it's 0, the contents of the loop are skipped. Otherwise, the contents of the loop run. At the end of the loop, if the contents of RAM at the current location is NOT zero, then it jumps back to the beginning of the loop.
Unfortunately, the way I built it in logisim would be represented by do { ... } while (*data_pointer);. This means the contents of the loop always run at least once. An open question is whether this change breaks the language's Turing Completeness. Another open question is what do we call this new language? But those questions are for another time.
Does this problem mean I have to sacrifice [ to the O(n) gods? Not necessarily... If I'm planning to do the "zero RAM on reset" strategy, I could throw in a parallel "index all the loop locations". I would have a bit of circuitry walk through program memory looking for all the [ and ]. In another RAM, at each address where there is a [ in the program, write the address of the corresponding ]. This reset circuitry could use the existing Stack RAM to balance the brackets.
So that's where I am as of this moment. The replacement breadboards should be here today. Yay!
Edit
On second thought, the index could be used for both [ and ], and the stack would only be necessary during analysis. That would make the control circuitry more consistent.
Part 1 (first) - Part 4 (prev) - Part 6 (next)