Hypothetical - Huge Text Adventure in 48K

Badger · Post by **Badger** » Tue Sep 15, 2020 8:18 pm

The thread about a new text adventure coming soon set me thinking as to whether you could squeeze more out of 48k then first thought.

I'll outline my what I was thinking and my possible solution . Its all hypothetical, and it may have been done before anyway, but theres nothing like re-inventing the wheel and having a glass of something while doing it

Ok, here we go. Text adventures, rely on being able to create an atmosphere using only words. So it stands to reason that the more words you can use the better. That can either be larger descriptive text and/or more locations.

So, each letter (including any spaces and puncuation) takes up 1 byte of memory. What if you could fit 2 letters into a single byte?

We can do this by using letter pairs.

Unfortunately we have an 8bit machine which we can store a number in a single byte between 0 and 256 and using 2 bytes to store the information would use the same amount of memory as just storing each letter individually. And with 26 letters in the alphabet, plus 2 for punctuation (space and full stop) that would be 28*28 = 784 combinations (aa, ab, ac....fg.....za,zz).

If we take the square root of 256 (15.9), this would mean we could only have say 14 letters plus a space and a full stop. Or we could increase the number of letters and discard unused letter pairs "QQ" "AA". It could even be possible to create a "AA" with the pairings. for example "beast of traal" breaks down to "be", "as", "t ","of", " t", "ra" "al". So our 14 character string can be stored as a 7 byte letter pair .

Obviously we would need a lookup table for each pair and this would take up 3*256 bytes. something like :-

Pair 1st Letter 2nd Letter
0 a space
1 a b
2 a c
. . .
17 b e
. . .
. . .
254 z y
255 z z

and then a routine that converts our stored text in letter pairs to display. So taking our "beast of traal" example could look something like
DATA 17,13,153,98,210,176,10 - obviously these are just picked randomly but would relate to the letter pair in the table.

To actually create the data, we could use something like a simple webpage with a text box that was validated by a regex expression along the lines of "/^([abcdefhiklmnoprstuvwy.]$/" (note missing letters j, q, x and z. We could then submit that to a php script for instance that would break down the text, generate the letter pair table and the data table for the text, or generate an error like "too many pair combinations" or "you have used 234 letter pairs out of 256" etc etc.

I know all this might look like a fruitless excercise with mass storage even for our Orics, so this was just a thought experiment while I relaxed aftrer work

I guess it could also be used with the alternate character set and mix graphics and text without all that messing around with hires drawing times.

I also guess that some clever person could with some clever manipulation of the character set and letter pairs create pseudo-hires images in text mode that each screen would only take up around 500 bytes or so.

Is this all nonsense. If it is and you want to throw rotten fruit at me, then could you make it apples or raspberries, as I can make a nice wine out of them

Chema · Post by **Chema** » Tue Sep 15, 2020 9:49 pm

I am quite sure the texts in most adventure games were already compressed... At least the Professional Adventure Writer System (sucesor of The Quill) did, creating a dictionary so most used letter combinations took up just one byte.

Also Infocom Z-machine in its version 3 supported games of 128K of data. In the Oric, pinforic does this by loading chunks of data from disk as needed. In theory the limit in this case could be the size of the disk.

As a side note, something similar was done in Blake's 7 (loading data on demand).

Badger · Post by **Badger** » Wed Sep 16, 2020 6:37 am

You are certainly correct chema and I didnt really think I was coming up with anything new. I didnt know however that it had been implemented in any way on the Oric, which it sounds like it was.

While you mention it, Blakes 7 is a great game even if I can't get very far and I wonder if games like that and 1337 , Space 1999 etc would have made a disk drive more in demand and hence forcing the production runs higher and cost down and we would have had more availablity. Thats a whole different topic. "Is there a killer App/Game that would have made you buy a disk drive?".

Post by **Dbug** » Wed Sep 16, 2020 4:39 pm

I believe "Cube" from Fabrice Frances has pushed the text storage quite far, the amount of text he managed to fit in just 4kb is quite impressive

https://www.oric.org/software/cube_4k-2499.html

Symoon · Post by **Symoon** » Tue Sep 22, 2020 12:30 pm

What was done on Mercenary III for Atari ST and Amiga was storing the text on 6 bits.
Hence 3 letters on 2 bytes, allowing 64 differents signs (26 letters, 10 numbers, and space and "!*,? etc.)
Along with a dictionnary of 5 or 6 words, which I guess used some of the unused values among the 64 possible.

Certainly not the best solution, but probably a rather easy one.

Chema · Post by **Chema** » Tue Sep 22, 2020 2:35 pm

I remember I used a text compression routine for some games. In my case, I needed to decompress strings of texts on demand, so needed something that was fast and could decompress from a given point up to the end of the string while printing.

The discussion and the final asm routine is here
https://forum.defence-force.org/viewtopic.php?f=4&t=190

The compression ratio was around 40-50% if I remember correctly. Not that much, but was fast enough. Maybe a bit naive...

retroric · Post by **retroric** » Sat Oct 10, 2020 6:42 pm

Hi,

Another approach would be dictionary-based; instead of trying to find an approach to optimize (or compress) encoding of individual letters, you would use or or more bytes as indices into a lookup table of words.

One approach I thought about would be inspired by the encoding of code points in Unicode, especially UTF-8 that does variable length encoding (1 byte for the most common code points, 2 bytes for the less common ones, and 3 bytes for the rarer ones).

This technique could be applied to encoding words:
* encoding indices to the most common words, (or to the shortest words like articles, punctuation and numerals) in one byte
* using 2 bytes to encode less common words.

As you don't want to encode words you don't use (or else you end up having to encode in excess of 50,000 words), you need to have your text processed to single out every distinct word used and allocate a one-byte or two-byte encoding to it, depending on its frequency (number of occurences) in the text.

Using a two-byte encoding, you could have:

Byte 1: 7 bits (bit 0 cleared: 0xxxxxxx) for one-byte values, so you can encode the 128 most used words in the text on one byte (well, really it is best to reserve one of the values as an 'end of sentence' value to terminate a stream of codes that will form a sentence.
15 bits (1xxxxxxx yyyyyyyy) for 2-byte values. Combining the 7 bits of the first byte that has bit 7 set, plus the 8 bits of the second byte, you get 15 bits that allow for encoding up to 32,768 words, which is much more than you'll ever need I think (although you do need to store plurals and feminine/masculine variants of words in some languages as separate words, and you do need also to encode numerals and numbers, and punctuation too including space).

Now, you need also of course memory areas to store both the lookup tables (indices to addresses of words in memory), and areas for storing the words themselves, as NUL-terminated strings (or to save some bytes, you can use the approach used in the ROM for storing BASIC tokens, by setting bit 7 high on the last letter of each token so you can detect the last letter and don't need this NUL terminator. This matter technique would actually yield a huge saving, equal to the number of distinct words: e.g for 2,000 words, you save 2,000 bytes.).

For each word, you then need two bytes for the address in the lookup table, plus of course the number of bytes necessary to store the word as ASCII bytes.

Finally, you also need some routines to:
-encode all the sentences in your game as a series of one or two-byte codes.
- a decode routine to translate a sentence expressed as a stream of codes into individual words. As a reminder you use a particular one-byte code in the stream to denote the end of the sentence.
two areas in memory to store the words (one ofor the one-byte code dwords, the other for the 2-byte coded words), and use a NUL as a word separator. BIG area of memory to store the words amotable of words

The only overhead of this approach lies in the memory space needed for the address tables (2 bytes per distinct word used) and the space needed for the decoding address (as the frequency distribution processing and encoding needs only be done once for each different game as a separate program that will produce the index and word memory tables that can be saved (and later loaded back) as memory blocks. So in effect you only 'lose' space for every word that is only used once in the text I think.

Now, this is all something I thought up while doing my shopping just earlier this afternoon after having read the inital posts in the morning, so I have no idea of the real efficiency of this approach, the only thing I know for sure is it is probably quite complicated and fastidious to implement and use, and probably a bit slow as well in-game to print sentences

forum.defence-force.org

Hypothetical - Huge Text Adventure in 48K

Hypothetical - Huge Text Adventure in 48K

Re: Hypothetical - Huge Text Adventure in 48K

Re: Hypothetical - Huge Text Adventure in 48K

Re: Hypothetical - Huge Text Adventure in 48K

Re: Hypothetical - Huge Text Adventure in 48K

Re: Hypothetical - Huge Text Adventure in 48K

Re: Hypothetical - Huge Text Adventure in 48K