ANEW --LZSS-- \ Wil Baden 2003-02-21 \ ******************************************************************* \ * * \ * Haruhiko Okumura 89-04-06 C * \ * Wil Baden 94-12-09 Standard Forth * \ * * \ * LZSS -- A Data Compression Program * \ * * \ * Use, distribute, and modify this program freely. * \ * * \ ******************************************************************* : ARRAY CREATE CELLS ALLOT DOES> SWAP CELLS + ; : CARRAY CREATE chars ALLOT DOES> SWAP chars + ; : CHECKED ABORT" File Access Error. " ; CREATE Single-Char-I/O-Buffer 0 C, align : Read-Char ( file -- char ) Single-Char-I/O-Buffer 1 ROT READ-FILE CHECKED IF Single-Char-I/O-Buffer C@ ELSE -1 THEN ; 4096 CONSTANT N \ Size of Ring Buffer 18 CONSTANT F \ Upper Limit for match-length 2 CONSTANT THRESHOLD \ Encode string into position & length \ if match-length is greater. N CONSTANT NIL \ Index for Binary Search Tree Root VARIABLE TEXTSIZE \ Text Size Counter VARIABLE CODESIZE \ Code Size Counter \ VARIABLE printcount \ Counter for Reporting Progress \ These are set by InsertNode procedure. VARIABLE Match-Position VARIABLE Match-Length \ Ring buffer of size N, with extra \ F-1 bytes to facilitate string comparison. N F + 1 - CARRAY Text-Buf \ *** Left & Right Children and Parents -- Binary Search Trees **** N 1 + ARRAY LSON N 257 + ARRAY RSON N 1 + ARRAY DAD \ ********************* Input & Output Files ********************** 0 VALUE INFILE 0 VALUE OUTFILE \ *************************** InitTrees *************************** \ For i = 0 to N - 1, RSON[i] and LSON[i] will be the right and \ left children of node i. These nodes need not be initialized. \ Also, DAD[i] is the parent of node i. These are initialized to \ Nil = N, which stands for `not used.' \ For i = 0 to 255, rson[N + i + 1] is the root of the tree \ for strings that begin with character i. These are initialized \ to Nil. Note there are 256 trees. \ Initialize trees. : InitTree ( -- ) N 257 + N 1 + DO NIL I RSON ! LOOP N 0 DO NIL I DAD ! LOOP ; \ ************************** InsertNode *************************** \ Insert string of length F, text_buf[r..r+F-1], into one of the \ trees of text_buf[r]'th tree and return the longest-match position \ and length via the global variables match-position and match-length. \ If match-length = F, then remove the old node in favor of the new \ one, because the old one will be deleted sooner. \ Note r plays double role, as tree node and position in buffer. : InsertNode ( r -- ) NIL over LSON ! NIL over RSON ! 0 Match-Length ! dup Text-Buf C@ N + 1 + ( r p) 1 ( r p cmp) BEGIN ( r p cmp) 0< NOT IF ( r p) dup RSON @ NIL = NOT IF RSON @ ELSE 2dup RSON ! SWAP DAD ! ( ) EXIT THEN ELSE ( r p) dup LSON @ NIL = NOT IF LSON @ ELSE 2dup LSON ! SWAP DAD ! ( ) EXIT THEN THEN ( r p) 0 F dup 1 DO ( r p 0 F) 3 PICK I + Text-Buf C@ ( r p 0 F c) 3 PICK I + Text-Buf C@ - ( r p 0 F diff) ?dup IF NIP NIP I LEAVE THEN ( r p 0 F) LOOP ( r p cmp i) dup Match-Length @ > IF 2 PICK Match-Position ! dup Match-Length ! F < NOT ELSE DROP FALSE THEN ( r p cmp flag) UNTIL ( r p cmp) DROP ( r p) 2dup DAD @ SWAP DAD ! 2dup LSON @ SWAP LSON ! 2dup RSON @ SWAP RSON ! 2dup LSON @ DAD ! 2dup RSON @ DAD ! dup DAD @ RSON @ over = IF TUCK DAD @ RSON ! ELSE TUCK DAD @ LSON ! THEN ( p) DAD NIL SWAP ! ( Remove p ) ( ) ; \ ************************** DeleteNode *************************** \ Deletes node p from tree. : DeleteNode ( p -- ) dup DAD @ NIL = IF DROP EXIT THEN ( Not in tree. ) ( CASE ) ( p) dup RSON @ NIL = IF dup LSON @ ELSE dup LSON @ NIL = IF dup RSON @ ELSE dup LSON @ ( p q) dup RSON @ NIL = NOT IF BEGIN RSON @ dup RSON @ NIL = UNTIL dup LSON @ over DAD @ RSON ! dup DAD @ over LSON @ DAD ! over LSON @ over LSON ! over LSON @ DAD over SWAP ! THEN over RSON @ over RSON ! over RSON @ DAD over SWAP ! ( ESAC ) THEN THEN ( p q) over DAD @ over DAD ! over dup DAD @ RSON @ = IF over DAD @ RSON ! ELSE over DAD @ LSON ! THEN ( p) DAD NIL SWAP ! ( ) ; \ **************************** ENCODE ***************************** 17 CARRAY Code-Buf VARIABLE LEN VARIABLE Last-Match-Length VARIABLE Code-Buf-PTR VARIABLE MASK : ENCODE ( -- ) 0 TEXTSIZE ! 0 CODESIZE ! InitTree ( Initialize trees. ) \ code_buf[1..16] saves eight units of code, and code_buf[0] \ works as eight flags, "1" representing that the unit is an \ unencoded letter in 1 byte, "0" a position-and-length pair \ in 2 bytes. Thus, eight units require at most 16 bytes \ of code. 0 0 Code-Buf C! 1 MASK C! 1 Code-Buf-PTR ! 0 N F - ( s r) \ Clear the buffer with any character that will appear often. 0 Text-Buf N F - BL FILL \ Read F bytes into the last F bytes of the buffer. dup Text-Buf F INFILE READ-FILE CHECKED ( s r count) dup LEN ! dup TEXTSIZE ! 0= IF EXIT THEN ( s r) \ Insert the F strings, each of which begins with one or more \ `space' characters. Note the order in which these strings \ are inserted. This way, degenerate trees will be less \ likely to occur. F 1 + 1 DO ( s r) dup I - InsertNode LOOP \ Finally, insert the whole string just read. The \ global variables match-length and match-position are set. dup InsertNode BEGIN ( s r) \ match_length may be spuriously long at end of text. Match-Length @ LEN @ > IF LEN @ Match-Length ! THEN Match-Length @ THRESHOLD > NOT IF \ Not long enough match. Send one byte. 1 Match-Length ! \ `send one byte' flag MASK C@ 0 Code-Buf C@ OR 0 Code-Buf C! \ Send uncoded. dup Text-Buf C@ Code-Buf-PTR @ Code-Buf C! 1 Code-Buf-PTR +! ELSE \ Send position and length pair. \ Note match-length > Threshold. Match-Position @ Code-Buf-PTR @ Code-Buf C! 1 Code-Buf-PTR +! Match-Position @ 8 RSHIFT 4 LSHIFT ( . . j) Match-Length @ THRESHOLD - 1 - OR Code-Buf-PTR @ Code-Buf C! ( . .) 1 Code-Buf-PTR +! THEN \ Shift mask left one bit. ( . .) MASK C@ 2* MASK C! MASK C@ 0= IF \ Send at most 8 units of code together. 0 Code-Buf Code-Buf-PTR @ ( . . a k) OUTFILE WRITE-FILE CHECKED ( . .) Code-Buf-PTR @ CODESIZE +! 0 0 Code-Buf C! 1 Code-Buf-PTR ! 1 MASK C! THEN ( s r) Match-Length @ Last-Match-Length ! Last-Match-Length @ dup 0 DO ( s r n) INFILE Read-Char ( s r n c) dup 0< IF 2DROP I LEAVE THEN \ Delete old strings and read new bytes. 3 PICK DeleteNode dup 3 1 + PICK Text-Buf C! \ If the position is near end of buffer, extend \ the buffer to make string comparison easier. 3 PICK F 1 - < IF ( s r n c) dup 3 1 + PICK N + Text-Buf C! THEN DROP ( s r n) \ Since this is a ring buffer, increment the \ position modulo N. >R >R ( s) 1+ N 1- AND R> ( s r) 1+ N 1- AND R> ( s r n) \ Register the string in text_buf[r..r+F-1]. over InsertNode LOOP ( s r i) dup TEXTSIZE +! \ textsize @ printcount @ > IF \ \ Report progress each time the textsize exceeds \ \ multiples of 1024. \ textsize @ 12 .R \ 1024 printcount +! \ THEN \ After the end of text, no need to read, but \ buffer may not be empty. Last-Match-Length @ SWAP ?DO ( s r) over DeleteNode >R 1 + N 1 - AND R> 1 + N 1 - AND -1 LEN +! LEN @ IF dup InsertNode THEN LOOP LEN @ 0> NOT UNTIL 2DROP \ Send remaining code. Code-Buf-PTR @ 1 > IF 0 Code-Buf Code-Buf-PTR @ OUTFILE WRITE-FILE CHECKED Code-Buf-PTR @ CODESIZE +! THEN ; \ ************************** STATISTICS *************************** : STATISTICS ( -- ) ." In : " TEXTSIZE ? CR ." Out: " CODESIZE ? CR TEXTSIZE @ IF ." Saved: " TEXTSIZE @ CODESIZE @ - 100 TEXTSIZE @ */ 2 .R ." %" CR THEN INFILE CLOSED OUTFILE CLOSED ; \ **************************** DECODE ***************************** \ Just the reverse of Encode. : DECODE ( -- ) 0 Text-Buf N F - BL FILL 0 N F - ( flags r) BEGIN >R ( flags) 1 RSHIFT dup 256 AND 0= IF DROP ( ) INFILE Read-Char ( c) dup 0< IF R> 2DROP EXIT ( c) THEN [ HEX ] 0FF00 [ DECIMAL ] OR ( flags) ( Uses higher byte to count eight. ) THEN R> ( flags r) over 1 AND IF INFILE Read-Char ( . . c) dup 0< IF DROP 2DROP EXIT ( . r c) THEN over Text-Buf C! ( . r) dup Text-Buf 1 OUTFILE WRITE-FILE CHECKED 1 + N 1 - AND ELSE INFILE Read-Char ( . . i) dup 0< IF DROP 2DROP EXIT ( . r c) THEN INFILE Read-Char ( . . i j) dup 0< IF 2DROP 2DROP EXIT ( . . i j) THEN dup >R 4 RSHIFT 8 LSHIFT OR R> 15 AND THRESHOLD + 1 + 0 ?DO ( . r i) dup I + N 1 - AND Text-Buf ( . r i a) dup 1 OUTFILE WRITE-FILE CHECKED C@ 2 PICK Text-Buf C! ( . r i) >R 1 + N 1 - AND R> LOOP ( . r i) DROP ( flags r) THEN AGAIN ; \\ // \\ // \\ // \\ // \\ // \\ // \\ // \\