[llvm-commits] [test-suite] r65042 - in /test-suite/trunk/MultiSource/Benchmarks/PAQ8p: ./ Makefile file1.in paq8p.cpp readme.txt
Torok Edwin
edwintorok at gmail.com
Thu Feb 19 04:03:00 PST 2009
Author: edwin
Date: Thu Feb 19 06:02:53 2009
New Revision: 65042
URL: http://llvm.org/viewvc/llvm-project?rev=65042&view=rev
Log:
Add PAQ8p (arithmetic compression program) as benchmark.
Although not seen from the test reports, gcc -O2 is about 15%-35% faster than
gcc at -O3, or llvm-gcc at -O2 or -O3. See PR3534.
Added:
test-suite/trunk/MultiSource/Benchmarks/PAQ8p/
test-suite/trunk/MultiSource/Benchmarks/PAQ8p/Makefile
test-suite/trunk/MultiSource/Benchmarks/PAQ8p/file1.in (with props)
test-suite/trunk/MultiSource/Benchmarks/PAQ8p/paq8p.cpp
test-suite/trunk/MultiSource/Benchmarks/PAQ8p/readme.txt
Added: test-suite/trunk/MultiSource/Benchmarks/PAQ8p/Makefile
URL: http://llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/PAQ8p/Makefile?rev=65042&view=auto
==============================================================================
--- test-suite/trunk/MultiSource/Benchmarks/PAQ8p/Makefile (added)
+++ test-suite/trunk/MultiSource/Benchmarks/PAQ8p/Makefile Thu Feb 19 06:02:53 2009
@@ -0,0 +1,11 @@
+LEVEL = ../../../
+
+PROG = paq8p
+CPPFLAGS += -DNOASM -DLLVM
+LDFLAGS = -lstdc++ -lm
+ifdef SMALL_PROBLEM_SIZE
+RUN_OPTIONS = -1 file1.in
+else
+RUN_OPTIONS = -4 file1.in
+endif
+include $(LEVEL)/MultiSource/Makefile.multisrc
Added: test-suite/trunk/MultiSource/Benchmarks/PAQ8p/file1.in
URL: http://llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/PAQ8p/file1.in?rev=65042&view=auto
==============================================================================
Binary file - no diff available.
Propchange: test-suite/trunk/MultiSource/Benchmarks/PAQ8p/file1.in
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: test-suite/trunk/MultiSource/Benchmarks/PAQ8p/paq8p.cpp
URL: http://llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/PAQ8p/paq8p.cpp?rev=65042&view=auto
==============================================================================
--- test-suite/trunk/MultiSource/Benchmarks/PAQ8p/paq8p.cpp (added)
+++ test-suite/trunk/MultiSource/Benchmarks/PAQ8p/paq8p.cpp Thu Feb 19 06:02:53 2009
@@ -0,0 +1,4494 @@
+/* paq8p file compressor/archiver. Release by Andreas Morphis, Aug. 22, 2008
+
+ Copyright (C) 2008 Matt Mahoney, Serge Osnach, Alexander Ratushnyak,
+ Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson,
+ Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, KZ.
+
+ LICENSE
+
+ This program is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License as
+ published by the Free Software Foundation; either version 2 of
+ the License, or (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details at
+ Visit <http://www.gnu.org/copyleft/gpl.html>.
+
+To install and use in Windows:
+
+- To install, put paq8p.exe or a shortcut to it on your desktop.
+- To compress a file or folder, drop it on the paq8p icon.
+- To decompress, drop a .paq8p file on the icon.
+
+A .paq8p extension is added for compression, removed for decompression.
+The output will go in the same folder as the input.
+
+While paq8p is working, a command window will appear and report
+progress. When it is done you can close the window by pressing
+ENTER or clicking [X].
+
+
+COMMAND LINE INTERFACE
+
+- To install, put paq8p.exe somewhere in your PATH.
+- To compress: paq8p [-N] file1 [file2...]
+- To decompress: paq8p [-d] file1.paq8p [dir2]
+- To view contents: more < file1.paq8p
+
+The compressed output file is named by adding ".paq8p" extension to
+the first named file (file1.paq8p). Each file that exists will be
+added to the archive and its name will be stored without a path.
+The option -N specifies a compression level ranging from -0
+(fastest) to -9 (smallest). The default is -5. If there is
+no option and only one file, then the program will pause when
+finished until you press the ENTER key (to support drag and drop).
+If file1.paq8p exists then it is overwritten.
+
+If the first named file ends in ".paq8p" then it is assumed to be
+an archive and the files within are extracted to the same directory
+as the archive unless a different directory (dir2) is specified.
+The -d option forces extraction even if there is not a ".paq8p"
+extension. If any output file already exists, then it is compared
+with the archive content and the first byte that differs is reported.
+No files are overwritten or deleted. If there is only one argument
+(no -d or dir2) then the program will pause when finished until
+you press ENTER.
+
+For compression, if any named file is actually a directory, then all
+files and subdirectories are compressed, preserving the directory
+structure, except that empty directories are not stored, and file
+attributes (timestamps, permissions, etc.) are not preserved.
+During extraction, directories are created as needed. For example:
+
+ paq8p -4 c:\tmp\foo bar
+
+compresses foo and bar (if they exist) to c:\tmp\foo.paq8p at level 4.
+
+ paq8p -d c:\tmp\foo.paq8p .
+
+extracts foo and compares bar in the current directory. If foo and bar
+are directories then their contents are extracted/compared.
+
+There are no commands to update an existing archive or to extract
+part of an archive. Files and archives larger than 2GB are not
+supported (but might work on 64-bit machines, not tested).
+File names with nonprintable characters are not supported (spaces
+are OK).
+
+
+TO COMPILE
+
+There are 2 files: paq8p.cpp (C++) and paq7asm.asm (NASM/YASM).
+paq7asm.asm is the same as in paq7 and paq8x. paq8p.cpp recognizes the
+following compiler options:
+
+ -DWINDOWS (to compile in Windows)
+ -DUNIX (to compile in Unix, Linux, Solairs, MacOS/Darwin, etc)
+ -DNOASM (to replace paq7asm.asm with equivalent C++)
+ -DDEFAULT_OPTION=N (to change the default compression level from 5 to N).
+
+If you compile without -DWINDOWS or -DUNIX, you can still compress files,
+but you cannot compress directories or create them during extraction.
+You can extract directories if you manually create the empty directories
+first.
+
+Use -DEFAULT_OPTION=N to change the default compression level to support
+drag and drop on machines with less than 256 MB of memory. Use
+-DDEFAULT_OPTION=4 for 128 MB, 3 for 64 MB, 2 for 32 MB, etc.
+
+Use -DNOASM for non x86-32 machines, or older than a Pentium-MMX (about
+1997), or if you don't have NASM or YASM to assemble paq7asm.asm. The
+program will still work but it will be slower. For NASM in Windows,
+use the options "--prefix _" and either "-f win32" or "-f obj" depending
+on your C++ compiler. In Linux, use "-f elf".
+
+Recommended compiler commands and optimizations:
+
+ MINGW g++:
+ nasm paq7asm.asm -f win32 --prefix _
+ g++ paq8p.cpp -DWINDOWS -O2 -Os -s -march=pentiumpro -fomit-frame-pointer -o paq8p.exe paq7asm.obj
+
+ Borland:
+ nasm paq7asm.asm -f obj --prefix _
+ bcc32 -DWINDOWS -O -w-8027 paq8p.cpp paq7asm.obj
+
+ Mars:
+ nasm paq7asm.asm -f obj --prefix _
+ dmc -DWINDOWS -Ae -O paq8p.cpp paq7asm.obj
+
+ UNIX/Linux (PC):
+ nasm -f elf paq7asm.asm
+ g++ paq8p.cpp -DUNIX -O2 -Os -s -march=pentiumpro -fomit-frame-pointer -o paq8p paq7asm.o
+
+ Non PC (e.g. PowerPC under MacOS X)
+ g++ paq8p.cpp -O2 -DUNIX -DNOASM -s -o paq8p
+
+MinGW produces faster executables than Borland or Mars, but Intel 9
+is about 4% faster than MinGW).
+
+
+ARCHIVE FILE FORMAT
+
+An archive has the following format. It is intended to be both
+human and machine readable. The header ends with CTRL-Z (Windows EOF)
+so that the binary compressed data is not displayed on the screen.
+
+ paq8p -N CR LF
+ size TAB filename CR LF
+ size TAB filename CR LF
+ ...
+ CTRL-Z
+ compressed binary data
+
+-N is the option (-0 to -9), even if a default was used.
+Plain file names are stored without a path. Files in compressed
+directories are stored with path relative to the compressed directory
+(using UNIX style forward slashes "/"). For example, given these files:
+
+ 123 C:\dir1\file1.txt
+ 456 C:\dir2\file2.txt
+
+Then
+
+ paq8p archive \dir1\file1.txt \dir2
+
+will create archive.paq8p with the header:
+
+ paq8p -5
+ 123 file1.txt
+ 456 dir2/file2.txt
+
+The command:
+
+ paq8p archive.paq8p C:\dir3
+
+will create the files:
+
+ C:\dir3\file1.txt
+ C:\dir3\dir2\file2.txt
+
+Decompression will fail if the first 7 bytes are not "paq8p -". Sizes
+are stored as decimal numbers. CR, LF, TAB, CTRL-Z are ASCII codes
+13, 10, 9, 26 respectively.
+
+
+ARITHMETIC CODING
+
+The binary data is arithmetic coded as the shortest base 256 fixed point
+number x = SUM_i x_i 256^-1-i such that p(<y) <= x < p(<=y), where y is the
+input string, x_i is the i'th coded byte, p(<y) (and p(<=y)) means the
+probability that a string is lexicographcally less than (less than
+or equal to) y according to the model, _ denotes subscript, and ^ denotes
+exponentiation.
+
+The model p(y) for y is a conditional bit stream,
+p(y) = PROD_j p(y_j | y_0..j-1) where y_0..j-1 denotes the first j
+bits of y, and y_j is the next bit. Compression depends almost entirely
+on the ability to predict the next bit accurately.
+
+
+MODEL MIXING
+
+paq8p uses a neural network to combine a large number of models. The
+i'th model independently predicts
+p1_i = p(y_j = 1 | y_0..j-1), p0_i = 1 - p1_i.
+The network computes the next bit probabilty
+
+ p1 = squash(SUM_i w_i t_i), p0 = 1 - p1 (1)
+
+where t_i = stretch(p1_i) is the i'th input, p1_i is the prediction of
+the i'th model, p1 is the output prediction, stretch(p) = ln(p/(1-p)),
+and squash(s) = 1/(1+exp(-s)). Note that squash() and stretch() are
+inverses of each other.
+
+After bit y_j (0 or 1) is received, the network is trained:
+
+ w_i := w_i + eta t_i (y_j - p1) (2)
+
+where eta is an ad-hoc learning rate, t_i is the i'th input, (y_j - p1)
+is the prediction error for the j'th input but, and w_i is the i'th
+weight. Note that this differs from back propagation:
+
+ w_i := w_i + eta t_i (y_j - p1) p0 p1 (3)
+
+which is a gradient descent in weight space to minimize root mean square
+error. Rather, the goal in compression is to minimize coding cost,
+which is -log(p0) if y = 1 or -log(p1) if y = 0. Taking
+the partial derivative of cost with respect to w_i yields (2).
+
+
+MODELS
+
+Most models are context models. A function of the context (last few
+bytes) is mapped by a lookup table or hash table to a state which depends
+on the bit history (prior sequence of 0 and 1 bits seen in this context).
+The bit history is then mapped to p1_i by a fixed or adaptive function.
+There are several types of bit history states:
+
+- Run Map. The state is (b,n) where b is the last bit seen (0 or 1) and
+ n is the number of consecutive times this value was seen. The initial
+ state is (0,0). The output is computed directly:
+
+ t_i = (2b - 1)K log(n + 1).
+
+ where K is ad-hoc, around 4 to 10. When bit y_j is seen, the state
+ is updated:
+
+ (b,n) := (b,n+1) if y_j = b, else (y_j,1).
+
+- Stationary Map. The state is p, initially 1/2. The output is
+ t_i = stretch(p). The state is updated at ad-hoc rate K (around 0.01):
+
+ p := p + K(y_j - p)
+
+- Nonstationary Map. This is a compromise between a stationary map, which
+ assumes uniform statistics, and a run map, which adapts quickly by
+ discarding old statistics. An 8 bit state represents (n0,n1,h), initially
+ (0,0,0) where:
+
+ n0 is the number of 0 bits seen "recently".
+ n1 is the number of 1 bits seen "recently".
+ n = n0 + n1.
+ h is the full bit history for 0 <= n <= 4,
+ the last bit seen (0 or 1) if 5 <= n <= 15,
+ 0 for n >= 16.
+
+ The primaty output is t_i := stretch(sm(n0,n1,h)), where sm(.) is
+ a stationary map with K = 1/256, initiaized to
+ sm(n0,n1,h) = (n1+(1/64))/(n+2/64). Four additional inputs are also
+ be computed to improve compression slightly:
+
+ p1_i = sm(n0,n1,h)
+ p0_i = 1 - p1_i
+ t_i := stretch(p_1)
+ t_i+1 := K1 (p1_i - p0_i)
+ t_i+2 := K2 stretch(p1) if n0 = 0, -K2 stretch(p1) if n1 = 0, else 0
+ t_i+3 := K3 (-p0_i if n1 = 0, p1_i if n0 = 0, else 0)
+ t_i+4 := K3 (-p0_i if n0 = 0, p1_i if n1 = 0, else 0)
+
+ where K1..K4 are ad-hoc constants.
+
+ h is updated as follows:
+ If n < 4, append y_j to h.
+ Else if n <= 16, set h := y_j.
+ Else h = 0.
+
+ The update rule is biased toward newer data in a way that allows
+ n0 or n1, but not both, to grow large by discarding counts of the
+ opposite bit. Large counts are incremented probabilistically.
+ Specifically, when y_j = 0 then the update rule is:
+
+ n0 := n0 + 1, n < 29
+ n0 + 1 with probability 2^(27-n0)/2 else n0, 29 <= n0 < 41
+ n0, n = 41.
+ n1 := n1, n1 <= 5
+ round(8/3 lg n1), if n1 > 5
+
+ swapping (n0,n1) when y_j = 1.
+
+ Furthermore, to allow an 8 bit representation for (n0,n1,h), states
+ exceeding the following values of n0 or n1 are replaced with the
+ state with the closest ratio n0:n1 obtained by decrementing the
+ smaller count: (41,0,h), (40,1,h), (12,2,h), (5,3,h), (4,4,h),
+ (3,5,h), (2,12,h), (1,40,h), (0,41,h). For example:
+ (12,2,1) 0-> (7,1,0) because there is no state (13,2,0).
+
+- Match Model. The state is (c,b), initially (0,0), where c is 1 if
+ the context was previously seen, else 0, and b is the next bit in
+ this context. The prediction is:
+
+ t_i := (2b - 1)Kc log(m + 1)
+
+ where m is the length of the context. The update rule is c := 1,
+ b := y_j. A match model can be implemented efficiently by storing
+ input in a buffer and storing pointers into the buffer into a hash
+ table indexed by context. Then c is indicated by a hash table entry
+ and b can be retrieved from the buffer.
+
+
+CONTEXTS
+
+High compression is achieved by combining a large number of contexts.
+Most (not all) contexts start on a byte boundary and end on the bit
+immediately preceding the predicted bit. The contexts below are
+modeled with both a run map and a nonstationary map unless indicated.
+
+- Order n. The last n bytes, up to about 16. For general purpose data.
+ Most of the compression occurs here for orders up to about 6.
+ An order 0 context includes only the 0-7 bits of the partially coded
+ byte and the number of these bits (255 possible values).
+
+- Sparse. Usually 1 or 2 of the last 8 bytes preceding the byte containing
+ the predicted bit, e.g (2), (3),..., (8), (1,3), (1,4), (1,5), (1,6),
+ (2,3), (2,4), (3,6), (4,8). The ordinary order 1 and 2 context, (1)
+ or (1,2) are included above. Useful for binary data.
+
+- Text. Contexts consists of whole words (a-z, converted to lower case
+ and skipping other values). Contexts may be sparse, e.g (0,2) meaning
+ the current (partially coded) word and the second word preceding the
+ current one. Useful contexts are (0), (0,1), (0,1,2), (0,2), (0,3),
+ (0,4). The preceding byte may or may not be included as context in the
+ current word.
+
+- Formatted text. The column number (determined by the position of
+ the last linefeed) is combined with other contexts: the charater to
+ the left and the character above it.
+
+- Fixed record length. The record length is determined by searching for
+ byte sequences with a uniform stride length. Once this is found, then
+ the record length is combined with the context of the bytes immediately
+ preceding it and the corresponding byte locations in the previous
+ one or two records (as with formatted text).
+
+- Context gap. The distance to the previous occurrence of the order 1
+ or order 2 context is combined with other low order (1-2) contexts.
+
+- FAX. For 2-level bitmapped images. Contexts are the surrounding
+ pixels already seen. Image width is assumed to be 1728 bits (as
+ in calgary/pic).
+
+- Image. For uncompressed 24-bit color BMP and TIFF images. Contexts
+ are the high order bits of the surrounding pixels and linear
+ combinations of those pixels, including other color planes. The
+ image width is detected from the file header. When an image is
+ detected, other models are turned off to improve speed.
+
+- JPEG. Files are further compressed by partially uncompressing back
+ to the DCT coefficients to provide context for the next Huffman code.
+ Only baseline DCT-Huffman coded files are modeled. (This ia about
+ 90% of images, the others are usually progresssive coded). JPEG images
+ embedded in other files (quite common) are detected by headers. The
+ baseline JPEG coding process is:
+ - Convert to grayscale and 2 chroma colorspace.
+ - Sometimes downsample the chroma images 2:1 or 4:1 in X and/or Y.
+ - Divide each of the 3 images into 8x8 blocks.
+ - Convert using 2-D discrete cosine transform (DCT) to 64 12-bit signed
+ coefficients.
+ - Quantize the coefficients by integer division (lossy).
+ - Split the image into horizontal slices coded independently, separated
+ by restart codes.
+ - Scan each block starting with the DC (0,0) coefficient in zigzag order
+ to the (7,7) coefficient, interleaving the 3 color components in
+ order to scan the whole image left to right starting at the top.
+ - Subtract the previous DC component from the current in each color.
+ - Code the coefficients using RS codes, where R is a run of R zeros (0-15)
+ and S indicates 0-11 bits of a signed value to follow. (There is a
+ special RS code (EOB) to indicate the rest of the 64 coefficients are 0).
+ - Huffman code the RS symbol, followed by S literal bits.
+ The most useful contexts are the current partially coded Huffman code
+ (including S following bits) combined with the coefficient position
+ (0-63), color (0-2), and last few RS codes.
+
+- Match. When a context match of 400 bytes or longer is detected,
+ the next bit of the match is predicted and other models are turned
+ off to improve speed.
+
+- Exe. When a x86 file (.exe, .obj, .dll) is detected, sparse contexts
+ with gaps of 1-12 selecting only the prefix, opcode, and the bits
+ of the modR/M byte that are relevant to parsing are selected.
+ This model is turned off otherwise.
+
+- Indirect. The history of the last 1-3 bytes in the context of the
+ last 1-2 bytes is combined with this 1-2 byte context.
+
+- DMC. A bitwise n-th order context is built from a state machine using
+ DMC, described in http://plg.uwaterloo.ca/~ftp/dmc/dmc.c
+ The effect is to extend a single context, one bit at a time and predict
+ the next bit based on the history in this context. The model here differs
+ in that two predictors are used. One is a pair of counts as in the original
+ DMC. The second predictor is a bit history state mapped adaptively to
+ a probability as as in a Nonstationary Map.
+
+ARCHITECTURE
+
+The context models are mixed by several of several hundred neural networks
+selected by a low-order context. The outputs of these networks are
+combined using a second neural network, then fed through several stages of
+adaptive probability maps (APM) before arithmetic coding.
+
+For images, only one neural network is used and its context is fixed.
+
+An APM is a stationary map combining a context and an input probability.
+The input probability is stretched and divided into 32 segments to
+combine with other contexts. The output is interpolated between two
+adjacent quantized values of stretch(p1). There are 2 APM stages in series:
+
+ p1 := (p1 + 3 APM(order 0, p1)) / 4.
+ p1 := (APM(order 1, p1) + 2 APM(order 2, p1) + APM(order 3, p1)) / 4.
+
+PREPROCESSING
+
+paq8p uses preprocessing transforms on certain data types to improve
+compression. To improve reliability, the decoding transform is
+tested during compression to ensure that the input file can be
+restored. If the decoder output is not identical to the input file
+due to a bug, then the transform is abandoned and the data is compressed
+without a transform so that it will still decompress correctly.
+
+The input is split into blocks with the format <type> <decoded size> <data>
+where <type> is 1 byte (0 = no transform), <decoded size> is the size
+of the data after decoding, which may be different than the size of <data>.
+Blocks do not span file boundaries, and have a maximum size of 4MB to
+2GB depending on compression level. Large files are split into blocks
+of this size. The preprocessor has 3 parts:
+
+- Detector. Splits the input into smaller blocks depending on data type.
+
+- Coder. Input is a block to be compressed. Output is a temporary
+ file. The coder determines whether a transform is to be applied
+ based on file type, and if so, which one. A coder may use lots
+ of resources (memory, time) and make multiple passes through the
+ input file. The file type is stored (as one byte) during compression.
+
+- Decoder. Performs the inverse transform of the coder. It uses few
+ resorces (fast, low memory) and runs in a single pass (stream oriented).
+ It takes input either from a file or the arithmetic decoder. Each call
+ to the decoder returns a single decoded byte.
+
+The following transforms are used:
+
+- EXE: CALL (0xE8) and JMP (0xE9) address operands are converted from
+ relative to absolute address. The transform is to replace the sequence
+ E8/E9 xx xx xx 00/FF by adding file offset modulo 2^25 (signed range,
+ little-endian format). Data to transform is identified by trying the
+ transform and applying a crude compression test: testing whether the
+ byte following the E8/E8 (LSB of the address) occurred more recently
+ in the transformed data than the original and within 4KB 4 times in
+ a row. The block ends when this does not happen for 4KB.
+
+- JPEG: detected by SOI and SOF and ending with EOI or any nondecodable
+ data. No transform is applied. The purpose is to separate images
+ embedded in execuables to block the EXE transform, and for a future
+ place to insert a transform.
+
+
+IMPLEMENTATION
+
+Hash tables are designed to minimize cache misses, which consume most
+of the CPU time.
+
+Most of the memory is used by the nonstationary context models.
+Contexts are represented by 32 bits, possibly a hash. These are
+mapped to a bit history, represented by 1 byte. The hash table is
+organized into 64-byte buckets on cache line boundaries. Each bucket
+contains 7 x 7 bit histories, 7 16-bit checksums, and a 2 element LRU
+queue packed into one byte. Each 7 byte element represents 7 histories
+for a context ending on a 3-bit boundary plus 0-2 more bits. One
+element (for bits 0-1, which have 4 unused bytes) also contains a run model
+consisting of the last byte seen and a count (as 1 byte each).
+
+Run models use 4 byte hash elements consisting of a 2 byte checksum, a
+repeat count (0-255) and the byte value. The count also serves as
+a priority.
+
+Stationary models are most appropriate for small contexts, so the
+context is used as a direct table lookup without hashing.
+
+The match model maintains a pointer to the last match until a mismatching
+bit is found. At the start of the next byte, the hash table is referenced
+to find another match. The hash table of pointers is updated after each
+whole byte. There is no checksum. Collisions are detected by comparing
+the current and matched context in a rotating buffer.
+
+The inner loops of the neural network prediction (1) and training (2)
+algorithms are implemented in MMX assembler, which computes 4 elements
+at a time. Using assembler is 8 times faster than C++ for this code
+and 1/3 faster overall. (However I found that SSE2 code on an AMD-64,
+which computes 8 elements at a time, is not any faster).
+
+
+DIFFERENCES FROM PAQ7
+
+An .exe model and filter are added. Context maps are improved using 16-bit
+checksums to reduce collisions. The state table uses probabilistic updates
+for large counts, more states that remember the last bit, and decreased
+discounting of the opposite count. It is implemented as a fixed table.
+There are also many minor changes.
+
+DIFFERENCES FROM PAQ8A
+
+The user interface supports directory compression and drag and drop.
+The preprocessor segments the input into blocks and uses more robust
+EXE detection. An indirect context model was added. There is no
+dictionary preprocesor like PAQ8B/C/D/E.
+
+DIFFERENCES FROM PAQ8F
+
+Different models, usually from paq8hp*. Also changed rate from 8 to 7. A bug
+in Array was fixed that caused the program to silently crash upon exit.
+
+DIFFERENCES FROM PAQ8J
+
+1) Slightly improved sparse model.
+2) Added new family of sparse contexts. Each byte mapped to 3-bit value, where
+different values corresponds to different byte classes. For example, input
+byte 0x00 transformed into 0, all bytes that less then 16 -- into 5, all
+punctuation marks (ispunct(c)!=0) -- into 2 etc. Then this flags from 11
+previous bytes combined into 32-bit pseudo-context.
+
+All this improvements gives only 62 byte on BOOK1, but on binaries archive size
+reduced on 1-2%.
+
+DIFFERENCES FROM PAQ8JA
+
+Introduced distance model. Distance model uses distance to last occurence
+of some anchor char ( 0x00, space, newline, 0xff ), combined with previous
+charactes as context. This slightly improves compression of files with
+variable-width record data.
+
+DIFFERENCES FROM PAQ8JB
+
+Restored recordModel(), broken in paq8hp*. Slightly tuned indirectModel().
+
+DIFFERENCES FROM PAQ8JC
+
+Changed the APMs in the Predictor. Up to a 0.2% improvement for some files.
+
+DIFFERENCES FROM PAQ8JD
+
+Added DMCModel. Removed some redundant models from SparseModel and other
+minor tuneups. Changes introduced in PAQ8K were not carried over.
+
+PAQ8L v.2
+
+Changed Mixer::p() to p() to fix a compiler error in Linux
+(patched by Indrek Kruusa, Apr. 15, 2007).
+
+DIFFERENCES FROM PAQ8L, PAQ8M
+
+Modified JPEG model by Jan Ondrus (paq8fthis2). The new model improves
+compression by using decoded pixel values of current and adjacent blocks
+as context. PAQ8M was an earlier version of the new JPEG model
+(from paq8fthis).
+
+DIFFERENCES FROM PAQ8N
+
+Improved bmp model. Slightly faster.
+
+DIFFERENCES FROM PAQ8O
+
+Modified JPEG model by Jan Ondrus (paq8fthis4).
+Added PGM (grayscale image) model form PAQ8I.
+Added grayscale BMP model to PGM model.
+Ver. 2 can be compiled using either old or new "for" loop scoping rules.
+Added APM and StateMap from LPAQ1
+Code optimizations from Enrico Zeidler
+Detection of BMP 4,8,24 bit and PGM 8 bit images before compress
+On non BMP,PGM,JPEG data mem is lower
+Fixed bug in BMP 8-bit detection in other files like .exe
+15. oct 2007
+Updates JPEG model by Jan Ondrus
+PGM detection bug fix
+22. oct 2007
+improved JPEG model by Jan Ondrus
+16. feb 2008
+fixed bmp detection bug
+added .rgb file support (uncompressed grayscale)
+
+DIFFERENCES FROM PAQ8O9
+
+Added wav Model. Slightly improved bmp model.
+*/
+
+#define PROGNAME "paq8p" // Please change this if you change the program.
+
+#ifdef LLVM
+#include <unistd.h>
+#include <sys/wait.h>
+#endif
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <math.h>
+#include <ctype.h>
+#define NDEBUG // remove for debugging (turns on Array bound checks)
+#include <assert.h>
+
+#ifdef UNIX
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <errno.h>
+#endif
+
+#ifdef WINDOWS
+#include <windows.h>
+#endif
+
+#ifndef DEFAULT_OPTION
+#define DEFAULT_OPTION 5
+#endif
+
+// 8, 16, 32 bit unsigned types (adjust as appropriate)
+typedef unsigned char U8;
+typedef unsigned short U16;
+typedef unsigned int U32;
+
+// min, max functions
+#ifndef WINDOWS
+inline int min(int a, int b) {return a<b?a:b;}
+inline int max(int a, int b) {return a<b?b:a;}
+#endif
+
+// Error handler: print message if any, and exit
+void quit(const char* message=0) {
+ throw message;
+}
+
+// strings are equal ignoring case?
+int equals(const char* a, const char* b) {
+ assert(a && b);
+ while (*a && *b) {
+ int c1=*a;
+ if (c1>='A'&&c1<='Z') c1+='a'-'A';
+ int c2=*b;
+ if (c2>='A'&&c2<='Z') c2+='a'-'A';
+ if (c1!=c2) return 0;
+ ++a;
+ ++b;
+ }
+ return *a==*b;
+}
+
+//////////////////////// Program Checker /////////////////////
+
+// Track time and memory used
+class ProgramChecker {
+ int memused; // bytes allocated by Array<T> now
+ int maxmem; // most bytes allocated ever
+ clock_t start_time; // in ticks
+public:
+ void alloc(int n) { // report memory allocated, may be negative
+ memused+=n;
+ if (memused>maxmem) maxmem=memused;
+ }
+ ProgramChecker(): memused(0), maxmem(0) {
+ start_time=clock();
+ assert(sizeof(U8)==1);
+ assert(sizeof(U16)==2);
+ assert(sizeof(U32)==4);
+ assert(sizeof(short)==2);
+ assert(sizeof(int)==4);
+ }
+ void print() const { // print time and memory used
+#ifdef LLVM
+ printf("used %d bytes of memory\n",
+ maxmem);
+#else
+ printf("Time %1.2f sec, used %d bytes of memory\n",
+ double(clock()-start_time)/CLOCKS_PER_SEC, maxmem);
+#endif
+ }
+} programChecker;
+
+//////////////////////////// Array ////////////////////////////
+
+// Array<T, ALIGN> a(n); creates n elements of T initialized to 0 bits.
+// Constructors for T are not called.
+// Indexing is bounds checked if assertions are on.
+// a.size() returns n.
+// a.resize(n) changes size to n, padding with 0 bits or truncating.
+// a.push_back(x) appends x and increases size by 1, reserving up to size*2.
+// a.pop_back() decreases size by 1, does not free memory.
+// Copy and assignment are not supported.
+// Memory is aligned on a ALIGN byte boundary (power of 2), default is none.
+
+template <class T, int ALIGN=0> class Array {
+private:
+ int n; // user size
+ int reserved; // actual size
+ char *ptr; // allocated memory, zeroed
+ T* data; // start of n elements of aligned data
+ void create(int i); // create with size i
+public:
+ explicit Array(int i=0) {create(i);}
+ ~Array();
+ T& operator[](int i) {
+#ifndef NDEBUG
+ if (i<0 || i>=n) fprintf(stderr, "%d out of bounds %d\n", i, n), quit();
+#endif
+ return data[i];
+ }
+ const T& operator[](int i) const {
+#ifndef NDEBUG
+ if (i<0 || i>=n) fprintf(stderr, "%d out of bounds %d\n", i, n), quit();
+#endif
+ return data[i];
+ }
+ int size() const {return n;}
+ void resize(int i); // change size to i
+ void pop_back() {if (n>0) --n;} // decrement size
+ void push_back(const T& x); // increment size, append x
+private:
+ Array(const Array&); // no copy or assignment
+ Array& operator=(const Array&);
+};
+
+template<class T, int ALIGN> void Array<T, ALIGN>::resize(int i) {
+ if (i<=reserved) {
+ n=i;
+ return;
+ }
+ char *saveptr=ptr;
+ T *savedata=data;
+ int saven=n;
+ create(i);
+ if (saveptr) {
+ if (savedata) {
+ memcpy(data, savedata, sizeof(T)*min(i, saven));
+ programChecker.alloc(-ALIGN-n*sizeof(T));
+ }
+ free(saveptr);
+ }
+}
+
+template<class T, int ALIGN> void Array<T, ALIGN>::create(int i) {
+ n=reserved=i;
+ if (i<=0) {
+ data=0;
+ ptr=0;
+ return;
+ }
+ const int sz=ALIGN+n*sizeof(T);
+ programChecker.alloc(sz);
+ ptr = (char*)calloc(sz, 1);
+ if (!ptr) quit("Out of memory");
+ data = (ALIGN ? (T*)(ptr+ALIGN-(((long)ptr)&(ALIGN-1))) : (T*)ptr);
+ assert((char*)data>=ptr && (char*)data<=ptr+ALIGN);
+}
+
+template<class T, int ALIGN> Array<T, ALIGN>::~Array() {
+ programChecker.alloc(-ALIGN-n*sizeof(T));
+ free(ptr);
+}
+
+template<class T, int ALIGN> void Array<T, ALIGN>::push_back(const T& x) {
+ if (n==reserved) {
+ int saven=n;
+ resize(max(1, n*2));
+ n=saven;
+ }
+ data[n++]=x;
+}
+
+/////////////////////////// String /////////////////////////////
+
+// A tiny subset of std::string
+// size() includes NUL terminator.
+
+class String: public Array<char> {
+public:
+ const char* c_str() const {return &(*this)[0];}
+ void operator=(const char* s) {
+ resize(strlen(s)+1);
+ strcpy(&(*this)[0], s);
+ }
+ void operator+=(const char* s) {
+ assert(s);
+ pop_back();
+ while (*s) push_back(*s++);
+ push_back(0);
+ }
+ String(const char* s=""): Array<char>(1) {
+ (*this)+=s;
+ }
+};
+
+
+//////////////////////////// rnd ///////////////////////////////
+
+// 32-bit pseudo random number generator
+class Random{
+ Array<U32> table;
+ int i;
+public:
+ Random(): table(64) {
+ table[0]=123456789;
+ table[1]=987654321;
+ for(int j=0; j<62; j++) table[j+2]=table[j+1]*11+table[j]*23/16;
+ i=0;
+ }
+ U32 operator()() {
+ return ++i, table[i&63]=table[i-24&63]^table[i-55&63];
+ }
+} rnd;
+
+////////////////////////////// Buf /////////////////////////////
+
+// Buf(n) buf; creates an array of n bytes (must be a power of 2).
+// buf[i] returns a reference to the i'th byte with wrap (no out of bounds).
+// buf(i) returns i'th byte back from pos (i > 0)
+// buf.size() returns n.
+
+int pos; // Number of input bytes in buf (not wrapped)
+
+class Buf {
+ Array<U8> b;
+public:
+ Buf(int i=0): b(i) {}
+ void setsize(int i) {
+ if (!i) return;
+ assert(i>0 && (i&(i-1))==0);
+ b.resize(i);
+ }
+ U8& operator[](int i) {
+ return b[i&b.size()-1];
+ }
+ int operator()(int i) const {
+ assert(i>0);
+ return b[pos-i&b.size()-1];
+ }
+ int size() const {
+ return b.size();
+ }
+};
+
+// IntBuf(n) is a buffer of n int (must be a power of 2).
+// intBuf[i] returns a reference to i'th element with wrap.
+
+class IntBuf {
+ Array<int> b;
+public:
+ IntBuf(int i=0): b(i) {}
+ int& operator[](int i) {
+ return b[i&b.size()-1];
+ }
+};
+
+/////////////////////// Global context /////////////////////////
+
+int level=DEFAULT_OPTION; // Compression level 0 to 9
+#define MEM (0x10000<<level)
+int y=0; // Last bit, 0 or 1, set by encoder
+
+// Global context set by Predictor and available to all models.
+int c0=1; // Last 0-7 bits of the partial byte with a leading 1 bit (1-255)
+U32 c4=0; // Last 4 whole bytes, packed. Last byte is bits 0-7.
+int bpos=0; // bits in c0 (0 to 7)
+Buf buf; // Rotating input queue set by Predictor
+
+///////////////////////////// ilog //////////////////////////////
+
+// ilog(x) = round(log2(x) * 16), 0 <= x < 64K
+class Ilog {
+ Array<U8> t;
+public:
+ int operator()(U16 x) const {return t[x];}
+ Ilog();
+} ilog;
+
+// Compute lookup table by numerical integration of 1/x
+Ilog::Ilog(): t(65536) {
+ U32 x=14155776;
+ for (int i=2; i<65536; ++i) {
+ x+=774541002/(i*2-1); // numerator is 2^29/ln 2
+ t[i]=x>>24;
+ }
+}
+
+// llog(x) accepts 32 bits
+inline int llog(U32 x) {
+ if (x>=0x1000000)
+ return 256+ilog(x>>16);
+ else if (x>=0x10000)
+ return 128+ilog(x>>8);
+ else
+ return ilog(x);
+}
+
+///////////////////////// state table ////////////////////////
+
+// State table:
+// nex(state, 0) = next state if bit y is 0, 0 <= state < 256
+// nex(state, 1) = next state if bit y is 1
+// nex(state, 2) = number of zeros in bit history represented by state
+// nex(state, 3) = number of ones represented
+//
+// States represent a bit history within some context.
+// State 0 is the starting state (no bits seen).
+// States 1-30 represent all possible sequences of 1-4 bits.
+// States 31-252 represent a pair of counts, (n0,n1), the number
+// of 0 and 1 bits respectively. If n0+n1 < 16 then there are
+// two states for each pair, depending on if a 0 or 1 was the last
+// bit seen.
+// If n0 and n1 are too large, then there is no state to represent this
+// pair, so another state with about the same ratio of n0/n1 is substituted.
+// Also, when a bit is observed and the count of the opposite bit is large,
+// then part of this count is discarded to favor newer data over old.
+
+#if 1 // change to #if 0 to generate this table at run time (4% slower)
+static const U8 State_table[256][4]={
+ { 1, 2, 0, 0},{ 3, 5, 1, 0},{ 4, 6, 0, 1},{ 7, 10, 2, 0}, // 0-3
+ { 8, 12, 1, 1},{ 9, 13, 1, 1},{ 11, 14, 0, 2},{ 15, 19, 3, 0}, // 4-7
+ { 16, 23, 2, 1},{ 17, 24, 2, 1},{ 18, 25, 2, 1},{ 20, 27, 1, 2}, // 8-11
+ { 21, 28, 1, 2},{ 22, 29, 1, 2},{ 26, 30, 0, 3},{ 31, 33, 4, 0}, // 12-15
+ { 32, 35, 3, 1},{ 32, 35, 3, 1},{ 32, 35, 3, 1},{ 32, 35, 3, 1}, // 16-19
+ { 34, 37, 2, 2},{ 34, 37, 2, 2},{ 34, 37, 2, 2},{ 34, 37, 2, 2}, // 20-23
+ { 34, 37, 2, 2},{ 34, 37, 2, 2},{ 36, 39, 1, 3},{ 36, 39, 1, 3}, // 24-27
+ { 36, 39, 1, 3},{ 36, 39, 1, 3},{ 38, 40, 0, 4},{ 41, 43, 5, 0}, // 28-31
+ { 42, 45, 4, 1},{ 42, 45, 4, 1},{ 44, 47, 3, 2},{ 44, 47, 3, 2}, // 32-35
+ { 46, 49, 2, 3},{ 46, 49, 2, 3},{ 48, 51, 1, 4},{ 48, 51, 1, 4}, // 36-39
+ { 50, 52, 0, 5},{ 53, 43, 6, 0},{ 54, 57, 5, 1},{ 54, 57, 5, 1}, // 40-43
+ { 56, 59, 4, 2},{ 56, 59, 4, 2},{ 58, 61, 3, 3},{ 58, 61, 3, 3}, // 44-47
+ { 60, 63, 2, 4},{ 60, 63, 2, 4},{ 62, 65, 1, 5},{ 62, 65, 1, 5}, // 48-51
+ { 50, 66, 0, 6},{ 67, 55, 7, 0},{ 68, 57, 6, 1},{ 68, 57, 6, 1}, // 52-55
+ { 70, 73, 5, 2},{ 70, 73, 5, 2},{ 72, 75, 4, 3},{ 72, 75, 4, 3}, // 56-59
+ { 74, 77, 3, 4},{ 74, 77, 3, 4},{ 76, 79, 2, 5},{ 76, 79, 2, 5}, // 60-63
+ { 62, 81, 1, 6},{ 62, 81, 1, 6},{ 64, 82, 0, 7},{ 83, 69, 8, 0}, // 64-67
+ { 84, 71, 7, 1},{ 84, 71, 7, 1},{ 86, 73, 6, 2},{ 86, 73, 6, 2}, // 68-71
+ { 44, 59, 5, 3},{ 44, 59, 5, 3},{ 58, 61, 4, 4},{ 58, 61, 4, 4}, // 72-75
+ { 60, 49, 3, 5},{ 60, 49, 3, 5},{ 76, 89, 2, 6},{ 76, 89, 2, 6}, // 76-79
+ { 78, 91, 1, 7},{ 78, 91, 1, 7},{ 80, 92, 0, 8},{ 93, 69, 9, 0}, // 80-83
+ { 94, 87, 8, 1},{ 94, 87, 8, 1},{ 96, 45, 7, 2},{ 96, 45, 7, 2}, // 84-87
+ { 48, 99, 2, 7},{ 48, 99, 2, 7},{ 88,101, 1, 8},{ 88,101, 1, 8}, // 88-91
+ { 80,102, 0, 9},{103, 69,10, 0},{104, 87, 9, 1},{104, 87, 9, 1}, // 92-95
+ {106, 57, 8, 2},{106, 57, 8, 2},{ 62,109, 2, 8},{ 62,109, 2, 8}, // 96-99
+ { 88,111, 1, 9},{ 88,111, 1, 9},{ 80,112, 0,10},{113, 85,11, 0}, // 100-103
+ {114, 87,10, 1},{114, 87,10, 1},{116, 57, 9, 2},{116, 57, 9, 2}, // 104-107
+ { 62,119, 2, 9},{ 62,119, 2, 9},{ 88,121, 1,10},{ 88,121, 1,10}, // 108-111
+ { 90,122, 0,11},{123, 85,12, 0},{124, 97,11, 1},{124, 97,11, 1}, // 112-115
+ {126, 57,10, 2},{126, 57,10, 2},{ 62,129, 2,10},{ 62,129, 2,10}, // 116-119
+ { 98,131, 1,11},{ 98,131, 1,11},{ 90,132, 0,12},{133, 85,13, 0}, // 120-123
+ {134, 97,12, 1},{134, 97,12, 1},{136, 57,11, 2},{136, 57,11, 2}, // 124-127
+ { 62,139, 2,11},{ 62,139, 2,11},{ 98,141, 1,12},{ 98,141, 1,12}, // 128-131
+ { 90,142, 0,13},{143, 95,14, 0},{144, 97,13, 1},{144, 97,13, 1}, // 132-135
+ { 68, 57,12, 2},{ 68, 57,12, 2},{ 62, 81, 2,12},{ 62, 81, 2,12}, // 136-139
+ { 98,147, 1,13},{ 98,147, 1,13},{100,148, 0,14},{149, 95,15, 0}, // 140-143
+ {150,107,14, 1},{150,107,14, 1},{108,151, 1,14},{108,151, 1,14}, // 144-147
+ {100,152, 0,15},{153, 95,16, 0},{154,107,15, 1},{108,155, 1,15}, // 148-151
+ {100,156, 0,16},{157, 95,17, 0},{158,107,16, 1},{108,159, 1,16}, // 152-155
+ {100,160, 0,17},{161,105,18, 0},{162,107,17, 1},{108,163, 1,17}, // 156-159
+ {110,164, 0,18},{165,105,19, 0},{166,117,18, 1},{118,167, 1,18}, // 160-163
+ {110,168, 0,19},{169,105,20, 0},{170,117,19, 1},{118,171, 1,19}, // 164-167
+ {110,172, 0,20},{173,105,21, 0},{174,117,20, 1},{118,175, 1,20}, // 168-171
+ {110,176, 0,21},{177,105,22, 0},{178,117,21, 1},{118,179, 1,21}, // 172-175
+ {110,180, 0,22},{181,115,23, 0},{182,117,22, 1},{118,183, 1,22}, // 176-179
+ {120,184, 0,23},{185,115,24, 0},{186,127,23, 1},{128,187, 1,23}, // 180-183
+ {120,188, 0,24},{189,115,25, 0},{190,127,24, 1},{128,191, 1,24}, // 184-187
+ {120,192, 0,25},{193,115,26, 0},{194,127,25, 1},{128,195, 1,25}, // 188-191
+ {120,196, 0,26},{197,115,27, 0},{198,127,26, 1},{128,199, 1,26}, // 192-195
+ {120,200, 0,27},{201,115,28, 0},{202,127,27, 1},{128,203, 1,27}, // 196-199
+ {120,204, 0,28},{205,115,29, 0},{206,127,28, 1},{128,207, 1,28}, // 200-203
+ {120,208, 0,29},{209,125,30, 0},{210,127,29, 1},{128,211, 1,29}, // 204-207
+ {130,212, 0,30},{213,125,31, 0},{214,137,30, 1},{138,215, 1,30}, // 208-211
+ {130,216, 0,31},{217,125,32, 0},{218,137,31, 1},{138,219, 1,31}, // 212-215
+ {130,220, 0,32},{221,125,33, 0},{222,137,32, 1},{138,223, 1,32}, // 216-219
+ {130,224, 0,33},{225,125,34, 0},{226,137,33, 1},{138,227, 1,33}, // 220-223
+ {130,228, 0,34},{229,125,35, 0},{230,137,34, 1},{138,231, 1,34}, // 224-227
+ {130,232, 0,35},{233,125,36, 0},{234,137,35, 1},{138,235, 1,35}, // 228-231
+ {130,236, 0,36},{237,125,37, 0},{238,137,36, 1},{138,239, 1,36}, // 232-235
+ {130,240, 0,37},{241,125,38, 0},{242,137,37, 1},{138,243, 1,37}, // 236-239
+ {130,244, 0,38},{245,135,39, 0},{246,137,38, 1},{138,247, 1,38}, // 240-243
+ {140,248, 0,39},{249,135,40, 0},{250, 69,39, 1},{ 80,251, 1,39}, // 244-247
+ {140,252, 0,40},{249,135,41, 0},{250, 69,40, 1},{ 80,251, 1,40}, // 248-251
+ {140,252, 0,41}}; // 252, 253-255 are reserved
+
+#define nex(state,sel) State_table[state][sel]
+
+// The code used to generate the above table at run time (4% slower).
+// To print the table, uncomment the 4 lines of print statements below.
+// In this code x,y = n0,n1 is the number of 0,1 bits represented by a state.
+#else
+
+class StateTable {
+ Array<U8> ns; // state*4 -> next state if 0, if 1, n0, n1
+ enum {B=5, N=64}; // sizes of b, t
+ static const int b[B]; // x -> max y, y -> max x
+ static U8 t[N][N][2]; // x,y -> state number, number of states
+ int num_states(int x, int y); // compute t[x][y][1]
+ void discount(int& x); // set new value of x after 1 or y after 0
+ void next_state(int& x, int& y, int b); // new (x,y) after bit b
+public:
+ int operator()(int state, int sel) {return ns[state*4+sel];}
+ StateTable();
+} nex;
+
+const int StateTable::b[B]={42,41,13,6,5}; // x -> max y, y -> max x
+U8 StateTable::t[N][N][2];
+
+int StateTable::num_states(int x, int y) {
+ if (x<y) return num_states(y, x);
+ if (x<0 || y<0 || x>=N || y>=N || y>=B || x>=b[y]) return 0;
+
+ // States 0-30 are a history of the last 0-4 bits
+ if (x+y<=4) { // x+y choose x = (x+y)!/x!y!
+ int r=1;
+ for (int i=x+1; i<=x+y; ++i) r*=i;
+ for (int i=2; i<=y; ++i) r/=i;
+ return r;
+ }
+
+ // States 31-255 represent a 0,1 count and possibly the last bit
+ // if the state is reachable by either a 0 or 1.
+ else
+ return 1+(y>0 && x+y<16);
+}
+
+// New value of count x if the opposite bit is observed
+void StateTable::discount(int& x) {
+ if (x>2) x=ilog(x)/6-1;
+}
+
+// compute next x,y (0 to N) given input b (0 or 1)
+void StateTable::next_state(int& x, int& y, int b) {
+ if (x<y)
+ next_state(y, x, 1-b);
+ else {
+ if (b) {
+ ++y;
+ discount(x);
+ }
+ else {
+ ++x;
+ discount(y);
+ }
+ while (!t[x][y][1]) {
+ if (y<2) --x;
+ else {
+ x=(x*(y-1)+(y/2))/y;
+ --y;
+ }
+ }
+ }
+}
+
+// Initialize next state table ns[state*4] -> next if 0, next if 1, x, y
+StateTable::StateTable(): ns(1024) {
+
+ // Assign states
+ int state=0;
+ for (int i=0; i<256; ++i) {
+ for (int y=0; y<=i; ++y) {
+ int x=i-y;
+ int n=num_states(x, y);
+ if (n) {
+ t[x][y][0]=state;
+ t[x][y][1]=n;
+ state+=n;
+ }
+ }
+ }
+
+ // Print/generate next state table
+ state=0;
+ for (int i=0; i<N; ++i) {
+ for (int y=0; y<=i; ++y) {
+ int x=i-y;
+ for (int k=0; k<t[x][y][1]; ++k) {
+ int x0=x, y0=y, x1=x, y1=y; // next x,y for input 0,1
+ int ns0=0, ns1=0;
+ if (state<15) {
+ ++x0;
+ ++y1;
+ ns0=t[x0][y0][0]+state-t[x][y][0];
+ ns1=t[x1][y1][0]+state-t[x][y][0];
+ if (x>0) ns1+=t[x-1][y+1][1];
+ ns[state*4]=ns0;
+ ns[state*4+1]=ns1;
+ ns[state*4+2]=x;
+ ns[state*4+3]=y;
+ }
+ else if (t[x][y][1]) {
+ next_state(x0, y0, 0);
+ next_state(x1, y1, 1);
+ ns[state*4]=ns0=t[x0][y0][0];
+ ns[state*4+1]=ns1=t[x1][y1][0]+(t[x1][y1][1]>1);
+ ns[state*4+2]=x;
+ ns[state*4+3]=y;
+ }
+ // uncomment to print table above
+// printf("{%3d,%3d,%2d,%2d},", ns[state*4], ns[state*4+1],
+// ns[state*4+2], ns[state*4+3]);
+// if (state%4==3) printf(" // %d-%d\n ", state-3, state);
+ assert(state>=0 && state<256);
+ assert(t[x][y][1]>0);
+ assert(t[x][y][0]<=state);
+ assert(t[x][y][0]+t[x][y][1]>state);
+ assert(t[x][y][1]<=6);
+ assert(t[x0][y0][1]>0);
+ assert(t[x1][y1][1]>0);
+ assert(ns0-t[x0][y0][0]<t[x0][y0][1]);
+ assert(ns0-t[x0][y0][0]>=0);
+ assert(ns1-t[x1][y1][0]<t[x1][y1][1]);
+ assert(ns1-t[x1][y1][0]>=0);
+ ++state;
+ }
+ }
+ }
+// printf("%d states\n", state); exit(0); // uncomment to print table above
+}
+
+#endif
+
+///////////////////////////// Squash //////////////////////////////
+
+// return p = 1/(1 + exp(-d)), d scaled by 8 bits, p scaled by 12 bits
+int squash(int d) {
+ static const int t[33]={
+ 1,2,3,6,10,16,27,45,73,120,194,310,488,747,1101,
+ 1546,2047,2549,2994,3348,3607,3785,3901,3975,4022,
+ 4050,4068,4079,4085,4089,4092,4093,4094};
+ if (d>2047) return 4095;
+ if (d<-2047) return 0;
+ int w=d&127;
+ d=(d>>7)+16;
+ return (t[d]*(128-w)+t[(d+1)]*w+64) >> 7;
+}
+
+//////////////////////////// Stretch ///////////////////////////////
+
+// Inverse of squash. d = ln(p/(1-p)), d scaled by 8 bits, p by 12 bits.
+// d has range -2047 to 2047 representing -8 to 8. p has range 0 to 4095.
+
+class Stretch {
+ Array<short> t;
+public:
+ Stretch();
+ int operator()(int p) const {
+ assert(p>=0 && p<4096);
+ return t[p];
+ }
+} stretch;
+
+Stretch::Stretch(): t(4096) {
+ int pi=0;
+ for (int x=-2047; x<=2047; ++x) { // invert squash()
+ int i=squash(x);
+ for (int j=pi; j<=i; ++j)
+ t[j]=x;
+ pi=i+1;
+ }
+ t[4095]=2047;
+}
+
+//////////////////////////// Mixer /////////////////////////////
+
+// Mixer m(N, M, S=1, w=0) combines models using M neural networks with
+// N inputs each, of which up to S may be selected. If S > 1 then
+// the outputs of these neural networks are combined using another
+// neural network (with parameters S, 1, 1). If S = 1 then the
+// output is direct. The weights are initially w (+-32K).
+// It is used as follows:
+// m.update() trains the network where the expected output is the
+// last bit (in the global variable y).
+// m.add(stretch(p)) inputs prediction from one of N models. The
+// prediction should be positive to predict a 1 bit, negative for 0,
+// nominally +-256 to +-2K. The maximum allowed value is +-32K but
+// using such large values may cause overflow if N is large.
+// m.set(cxt, range) selects cxt as one of 'range' neural networks to
+// use. 0 <= cxt < range. Should be called up to S times such
+// that the total of the ranges is <= M.
+// m.p() returns the output prediction that the next bit is 1 as a
+// 12 bit number (0 to 4095).
+
+// dot_product returns dot product t*w of n elements. n is rounded
+// up to a multiple of 8. Result is scaled down by 8 bits.
+#ifdef NOASM // no assembly language
+int dot_product(short *t, short *w, int n) {
+ int sum=0;
+ n=(n+7)&-8;
+ for (int i=0; i<n; i+=2)
+ sum+=(t[i]*w[i]+t[i+1]*w[i+1]) >> 8;
+ return sum;
+}
+#else // The NASM version uses MMX and is about 8 times faster.
+extern "C" int dot_product(short *t, short *w, int n); // in NASM
+#endif
+
+// Train neural network weights w[n] given inputs t[n] and err.
+// w[i] += t[i]*err, i=0..n-1. t, w, err are signed 16 bits (+- 32K).
+// err is scaled 16 bits (representing +- 1/2). w[i] is clamped to +- 32K
+// and rounded. n is rounded up to a multiple of 8.
+#ifdef NOASM
+void train(short *t, short *w, int n, int err) {
+ n=(n+7)&-8;
+ for (int i=0; i<n; ++i) {
+ int wt=w[i]+((t[i]*err*2>>16)+1>>1);
+ if (wt<-32768) wt=-32768;
+ if (wt>32767) wt=32767;
+ w[i]=wt;
+ }
+}
+#else
+extern "C" void train(short *t, short *w, int n, int err); // in NASM
+#endif
+
+class Mixer {
+ const int N, M, S; // max inputs, max contexts, max context sets
+ Array<short, 16> tx; // N inputs from add()
+ Array<short, 16> wx; // N*M weights
+ Array<int> cxt; // S contexts
+ int ncxt; // number of contexts (0 to S)
+ int base; // offset of next context
+ int nx; // Number of inputs in tx, 0 to N
+ Array<int> pr; // last result (scaled 12 bits)
+ Mixer* mp; // points to a Mixer to combine results
+public:
+ Mixer(int n, int m, int s=1, int w=0);
+
+ // Adjust weights to minimize coding cost of last prediction
+ void update() {
+ for (int i=0; i<ncxt; ++i) {
+ int err=((y<<12)-pr[i])*7;
+ assert(err>=-32768 && err<32768);
+ if (err) train(&tx[0], &wx[cxt[i]*N], nx, err);
+ }
+ nx=base=ncxt=0;
+ }
+
+ // Input x (call up to N times)
+ void add(int x) {
+ assert(nx<N);
+ tx[nx++]=x;
+ }
+
+ // Set a context (call S times, sum of ranges <= M)
+ void set(int cx, int range) {
+ assert(range>=0);
+ assert(ncxt<S);
+ assert(cx>=0);
+ assert(base+cx<M);
+ cxt[ncxt++]=base+cx;
+ base+=range;
+ }
+
+ // predict next bit
+ int p() {
+ while (nx&7) tx[nx++]=0; // pad
+ if (mp) { // combine outputs
+ mp->update();
+ for (int i=0; i<ncxt; ++i) {
+ pr[i]=squash(dot_product(&tx[0], &wx[cxt[i]*N], nx)>>5);
+ mp->add(stretch(pr[i]));
+ }
+ mp->set(0, 1);
+ return mp->p();
+ }
+ else { // S=1 context
+ return pr[0]=squash(dot_product(&tx[0], &wx[0], nx)>>8);
+ }
+ }
+ ~Mixer();
+};
+
+Mixer::~Mixer() {
+ delete mp;
+}
+
+
+Mixer::Mixer(int n, int m, int s, int w):
+ N((n+7)&-8), M(m), S(s), tx(N), wx(N*M),
+ cxt(S), ncxt(0), base(0), nx(0), pr(S), mp(0) {
+ assert(n>0 && N>0 && (N&7)==0 && M>0);
+ int i;
+ for ( i=0; i<S; ++i)
+ pr[i]=2048;
+ for ( i=0; i<N*M; ++i)
+ wx[i]=w;
+ if (S>1) mp=new Mixer(S, 1, 1, 0x7fff);
+}
+
+//////////////////////////// APM1 //////////////////////////////
+
+// APM1 maps a probability and a context into a new probability
+// that bit y will next be 1. After each guess it updates
+// its state to improve future guesses. Methods:
+//
+// APM1 a(N) creates with N contexts, uses 66*N bytes memory.
+// a.p(pr, cx, rate=7) returned adjusted probability in context cx (0 to
+// N-1). rate determines the learning rate (smaller = faster, default 7).
+// Probabilities are scaled 12 bits (0-4095).
+
+class APM1 {
+ int index; // last p, context
+ const int N; // number of contexts
+ Array<U16> t; // [N][33]: p, context -> p
+public:
+ APM1(int n);
+ int p(int pr=2048, int cxt=0, int rate=7) {
+ assert(pr>=0 && pr<4096 && cxt>=0 && cxt<N && rate>0 && rate<32);
+ pr=stretch(pr);
+ int g=(y<<16)+(y<<rate)-y-y;
+ t[index] += g-t[index] >> rate;
+ t[index+1] += g-t[index+1] >> rate;
+ const int w=pr&127; // interpolation weight (33 points)
+ index=(pr+2048>>7)+cxt*33;
+ return t[index]*(128-w)+t[index+1]*w >> 11;
+ }
+};
+
+// maps p, cxt -> p initially
+APM1::APM1(int n): index(0), N(n), t(n*33) {
+ for (int i=0; i<N; ++i)
+ for (int j=0; j<33; ++j)
+ t[i*33+j] = i==0 ? squash((j-16)*128)*16 : t[j];
+}
+
+//////////////////////////// StateMap, APM //////////////////////////
+
+// A StateMap maps a context to a probability. Methods:
+//
+// Statemap sm(n) creates a StateMap with n contexts using 4*n bytes memory.
+// sm.p(y, cx, limit) converts state cx (0..n-1) to a probability (0..4095).
+// that the next y=1, updating the previous prediction with y (0..1).
+// limit (1..1023, default 1023) is the maximum count for computing a
+// prediction. Larger values are better for stationary sources.
+
+static int dt[1024]; // i -> 16K/(i+3)
+
+class StateMap {
+protected:
+ const int N; // Number of contexts
+ int cxt; // Context of last prediction
+ Array<U32> t; // cxt -> prediction in high 22 bits, count in low 10 bits
+ inline void update(int limit) {
+ assert(cxt>=0 && cxt<N);
+ U32 *p=&t[cxt], p0=p[0];
+ int n=p0&1023, pr=p0>>10; // count, prediction
+ if (n<limit) ++p0;
+ else p0=p0&0xfffffc00|limit;;
+ p0+=(((y<<22)-pr)>>3)*dt[n]&0xfffffc00;
+ p[0]=p0;
+ }
+
+public:
+ StateMap(int n=256);
+
+ // update bit y (0..1), predict next bit in context cx
+ int p(int cx, int limit=1023) {
+ assert(cx>=0 && cx<N);
+ assert(limit>0 && limit<1024);
+ update(limit);
+ return t[cxt=cx]>>20;
+ }
+};
+
+StateMap::StateMap(int n): N(n), cxt(0), t(n) {
+ for (int i=0; i<N; ++i)
+ t[i]=1<<31;
+}
+
+// An APM maps a probability and a context to a new probability. Methods:
+//
+// APM a(n) creates with n contexts using 96*n bytes memory.
+// a.pp(y, pr, cx, limit) updates and returns a new probability (0..4095)
+// like with StateMap. pr (0..4095) is considered part of the context.
+// The output is computed by interpolating pr into 24 ranges nonlinearly
+// with smaller ranges near the ends. The initial output is pr.
+// y=(0..1) is the last bit. cx=(0..n-1) is the other context.
+// limit=(0..1023) defaults to 255.
+
+class APM: public StateMap {
+public:
+ APM(int n);
+ int p(int pr, int cx, int limit=255) {
+ // assert(y>>1==0);
+ assert(pr>=0 && pr<4096);
+ assert(cx>=0 && cx<N/24);
+ assert(limit>0 && limit<1024);
+ update(limit);
+ pr=(stretch(pr)+2048)*23;
+ int wt=pr&0xfff; // interpolation weight of next element
+ cx=cx*24+(pr>>12);
+ assert(cx>=0 && cx<N-1);
+ cxt=cx+(wt>>11);
+ pr=(t[cx]>>13)*(0x1000-wt)+(t[cx+1]>>13)*wt>>19;
+ return pr;
+ }
+};
+
+APM::APM(int n): StateMap(n*24) {
+ for (int i=0; i<N; ++i) {
+ int p=((i%24*2+1)*4096)/48-2048;
+ t[i]=(U32(squash(p))<<20)+6;
+ }
+}
+
+
+//////////////////////////// hash //////////////////////////////
+
+// Hash 2-5 ints.
+inline U32 hash(U32 a, U32 b, U32 c=0xffffffff, U32 d=0xffffffff,
+ U32 e=0xffffffff) {
+ U32 h=a*200002979u+b*30005491u+c*50004239u+d*70004807u+e*110002499u;
+ return h^h>>9^a>>2^b>>3^c>>4^d>>5^e>>6;
+}
+
+///////////////////////////// BH ////////////////////////////////
+
+// A BH maps a 32 bit hash to an array of B bytes (checksum and B-2 values)
+//
+// BH bh(N); creates N element table with B bytes each.
+// N must be a power of 2. The first byte of each element is
+// reserved for a checksum to detect collisions. The remaining
+// B-1 bytes are values, prioritized by the first value. This
+// byte is 0 to mark an unused element.
+//
+// bh[i] returns a pointer to the i'th element, such that
+// bh[i][0] is a checksum of i, bh[i][1] is the priority, and
+// bh[i][2..B-1] are other values (0-255).
+// The low lg(n) bits as an index into the table.
+// If a collision is detected, up to M nearby locations in the same
+// cache line are tested and the first matching checksum or
+// empty element is returned.
+// If no match or empty element is found, then the lowest priority
+// element is replaced.
+
+// 2 byte checksum with LRU replacement (except last 2 by priority)
+template <int B> class BH {
+ enum {M=8}; // search limit
+ Array<U8, 64> t; // elements
+ U32 n; // size-1
+public:
+ BH(int i): t(i*B), n(i-1) {
+ assert(B>=2 && i>0 && (i&(i-1))==0); // size a power of 2?
+ }
+ U8* operator[](U32 i);
+};
+
+template <int B>
+inline U8* BH<B>::operator[](U32 i) {
+ int chk=(i>>16^i)&0xffff;
+ i=i*M&n;
+ U8 *p;
+ U16 *cp;
+ int j;
+ for (j=0; j<M; ++j) {
+ p=&t[(i+j)*B];
+ cp=(U16*)p;
+ if (p[2]==0) {*cp=chk;break;}
+ if (*cp==chk) break; // found
+ }
+ if (j==0) return p+1; // front
+ static U8 tmp[B]; // element to move to front
+ if (j==M) {
+ --j;
+ memset(tmp, 0, B);
+ *(U16*)tmp=chk;
+ if (M>2 && t[(i+j)*B+2]>t[(i+j-1)*B+2]) --j;
+ }
+ else memcpy(tmp, cp, B);
+ memmove(&t[(i+1)*B], &t[i*B], j*B);
+ memcpy(&t[i*B], tmp, B);
+ return &t[i*B+1];
+}
+
+/////////////////////////// ContextMap /////////////////////////
+//
+// A ContextMap maps contexts to a bit histories and makes predictions
+// to a Mixer. Methods common to all classes:
+//
+// ContextMap cm(M, C); creates using about M bytes of memory (a power
+// of 2) for C contexts.
+// cm.set(cx); sets the next context to cx, called up to C times
+// cx is an arbitrary 32 bit value that identifies the context.
+// It should be called before predicting the first bit of each byte.
+// cm.mix(m) updates Mixer m with the next prediction. Returns 1
+// if context cx is found, else 0. Then it extends all the contexts with
+// global bit y. It should be called for every bit:
+//
+// if (bpos==0)
+// for (int i=0; i<C; ++i) cm.set(cxt[i]);
+// cm.mix(m);
+//
+// The different types are as follows:
+//
+// - RunContextMap. The bit history is a count of 0-255 consecutive
+// zeros or ones. Uses 4 bytes per whole byte context. C=1.
+// The context should be a hash.
+// - SmallStationaryContextMap. 0 <= cx < M/512.
+// The state is a 16-bit probability that is adjusted after each
+// prediction. C=1.
+// - ContextMap. For large contexts, C >= 1. Context need not be hashed.
+
+// Predict to mixer m from bit history state s, using sm to map s to
+// a probability.
+inline int mix2(Mixer& m, int s, StateMap& sm) {
+ int p1=sm.p(s);
+ int n0=-!nex(s,2);
+ int n1=-!nex(s,3);
+ int st=stretch(p1)>>2;
+ m.add(st);
+ p1>>=4;
+ int p0=255-p1;
+ m.add(p1-p0);
+ m.add(st*(n1-n0));
+ m.add((p1&n0)-(p0&n1));
+ m.add((p1&n1)-(p0&n0));
+ return s>0;
+}
+
+// A RunContextMap maps a context into the next byte and a repeat
+// count up to M. Size should be a power of 2. Memory usage is 3M/4.
+class RunContextMap {
+ BH<4> t;
+ U8* cp;
+public:
+ RunContextMap(int m): t(m/4) {cp=t[0]+1;}
+ void set(U32 cx) { // update count
+ if (cp[0]==0 || cp[1]!=buf(1)) cp[0]=1, cp[1]=buf(1);
+ else if (cp[0]<255) ++cp[0];
+ cp=t[cx]+1;
+ }
+ int p() { // predict next bit
+ if (cp[1]+256>>8-bpos==c0)
+ return ((cp[1]>>7-bpos&1)*2-1)*ilog(cp[0]+1)*8;
+ else
+ return 0;
+ }
+ int mix(Mixer& m) { // return run length
+ m.add(p());
+ return cp[0]!=0;
+ }
+};
+
+// Context is looked up directly. m=size is power of 2 in bytes.
+// Context should be < m/512. High bits are discarded.
+class SmallStationaryContextMap {
+ Array<U16> t;
+ int cxt;
+ U16 *cp;
+public:
+ SmallStationaryContextMap(int m): t(m/2), cxt(0) {
+ assert((m/2&m/2-1)==0); // power of 2?
+ for (int i=0; i<t.size(); ++i)
+ t[i]=32768;
+ cp=&t[0];
+ }
+ void set(U32 cx) {
+ cxt=cx*256&t.size()-256;
+ }
+ void mix(Mixer& m, int rate=7) {
+ *cp += (y<<16)-*cp+(1<<rate-1) >> rate;
+ cp=&t[cxt+c0];
+ m.add(stretch(*cp>>4));
+ }
+};
+
+// Context map for large contexts. Most modeling uses this type of context
+// map. It includes a built in RunContextMap to predict the last byte seen
+// in the same context, and also bit-level contexts that map to a bit
+// history state.
+//
+// Bit histories are stored in a hash table. The table is organized into
+// 64-byte buckets alinged on cache page boundaries. Each bucket contains
+// a hash chain of 7 elements, plus a 2 element queue (packed into 1 byte)
+// of the last 2 elements accessed for LRU replacement. Each element has
+// a 2 byte checksum for detecting collisions, and an array of 7 bit history
+// states indexed by the last 0 to 2 bits of context. The buckets are indexed
+// by a context ending after 0, 2, or 5 bits of the current byte. Thus, each
+// byte modeled results in 3 main memory accesses per context, with all other
+// accesses to cache.
+//
+// On bits 0, 2 and 5, the context is updated and a new bucket is selected.
+// The most recently accessed element is tried first, by comparing the
+// 16 bit checksum, then the 7 elements are searched linearly. If no match
+// is found, then the element with the lowest priority among the 5 elements
+// not in the LRU queue is replaced. After a replacement, the queue is
+// emptied (so that consecutive misses favor a LFU replacement policy).
+// In all cases, the found/replaced element is put in the front of the queue.
+//
+// The priority is the state number of the first element (the one with 0
+// additional bits of context). The states are sorted by increasing n0+n1
+// (number of bits seen), implementing a LFU replacement policy.
+//
+// When the context ends on a byte boundary (bit 0), only 3 of the 7 bit
+// history states are used. The remaining 4 bytes implement a run model
+// as follows: <count:7,d:1> <b1> <unused> <unused> where <b1> is the last byte
+// seen, possibly repeated. <count:7,d:1> is a 7 bit count and a 1 bit
+// flag (represented by count * 2 + d). If d=0 then <count> = 1..127 is the
+// number of repeats of <b1> and no other bytes have been seen. If d is 1 then
+// other byte values have been seen in this context prior to the last <count>
+// copies of <b1>.
+//
+// As an optimization, the last two hash elements of each byte (representing
+// contexts with 2-7 bits) are not updated until a context is seen for
+// a second time. This is indicated by <count,d> = <1,0> (2). After update,
+// <count,d> is updated to <2,0> or <1,1> (4 or 3).
+
+class ContextMap {
+ const int C; // max number of contexts
+ class E { // hash element, 64 bytes
+ U16 chk[7]; // byte context checksums
+ U8 last; // last 2 accesses (0-6) in low, high nibble
+ public:
+ U8 bh[7][7]; // byte context, 3-bit context -> bit history state
+ // bh[][0] = 1st bit, bh[][1,2] = 2nd bit, bh[][3..6] = 3rd bit
+ // bh[][0] is also a replacement priority, 0 = empty
+ U8* get(U16 chk); // Find element (0-6) matching checksum.
+ // If not found, insert or replace lowest priority (not last).
+ };
+ Array<E, 64> t; // bit histories for bits 0-1, 2-4, 5-7
+ // For 0-1, also contains a run count in bh[][4] and value in bh[][5]
+ // and pending update count in bh[7]
+ Array<U8*> cp; // C pointers to current bit history
+ Array<U8*> cp0; // First element of 7 element array containing cp[i]
+ Array<U32> cxt; // C whole byte contexts (hashes)
+ Array<U8*> runp; // C [0..3] = count, value, unused, unused
+ StateMap *sm; // C maps of state -> p
+ int cn; // Next context to set by set()
+ void update(U32 cx, int c); // train model that context cx predicts c
+ int mix1(Mixer& m, int cc, int bp, int c1, int y1);
+ // mix() with global context passed as arguments to improve speed.
+public:
+ ContextMap(int m, int c=1); // m = memory in bytes, a power of 2, C = c
+ ~ContextMap();
+ void set(U32 cx, int next=-1); // set next whole byte context to cx
+ // if next is 0 then set order does not matter
+ int mix(Mixer& m) {return mix1(m, c0, bpos, buf(1), y);}
+};
+
+// Find or create hash element matching checksum ch
+inline U8* ContextMap::E::get(U16 ch) {
+ if (chk[last&15]==ch) return &bh[last&15][0];
+ int b=0xffff, bi=0;
+ for (int i=0; i<7; ++i) {
+ if (chk[i]==ch) return last=last<<4|i, (U8*)&bh[i][0];
+ int pri=bh[i][0];
+ if (pri<b && (last&15)!=i && last>>4!=i) b=pri, bi=i;
+ }
+ return last=0xf0|bi, chk[bi]=ch, (U8*)memset(&bh[bi][0], 0, 7);
+}
+
+// Construct using m bytes of memory for c contexts
+ContextMap::ContextMap(int m, int c): C(c), t(m>>6), cp(c), cp0(c),
+ cxt(c), runp(c), cn(0) {
+ assert(m>=64 && (m&m-1)==0); // power of 2?
+ assert(sizeof(E)==64);
+ sm=new StateMap[C];
+ for (int i=0; i<C; ++i) {
+ cp0[i]=cp[i]=&t[0].bh[0][0];
+ runp[i]=cp[i]+3;
+ }
+}
+
+ContextMap::~ContextMap() {
+ delete[] sm;
+}
+
+// Set the i'th context to cx
+inline void ContextMap::set(U32 cx, int next) {
+ int i=cn++;
+ i&=next;
+ assert(i>=0 && i<C);
+ cx=cx*987654323+i; // permute (don't hash) cx to spread the distribution
+ cx=cx<<16|cx>>16;
+ cxt[i]=cx*123456791+i;
+}
+
+// Update the model with bit y1, and predict next bit to mixer m.
+// Context: cc=c0, bp=bpos, c1=buf(1), y1=y.
+int ContextMap::mix1(Mixer& m, int cc, int bp, int c1, int y1) {
+
+ // Update model with y
+ int result=0;
+ for (int i=0; i<cn; ++i) {
+ if (cp[i]) {
+ assert(cp[i]>=&t[0].bh[0][0] && cp[i]<=&t[t.size()-1].bh[6][6]);
+ assert((long(cp[i])&63)>=15);
+ int ns=nex(*cp[i], y1);
+ if (ns>=204 && rnd() << (452-ns>>3)) ns-=4; // probabilistic increment
+ *cp[i]=ns;
+ }
+
+ // Update context pointers
+ if (bpos>1 && runp[i][0]==0)
+ {
+ cp[i]=0;
+ }
+ else
+ {
+ switch(bpos)
+ {
+ case 1: case 3: case 6: cp[i]=cp0[i]+1+(cc&1); break;
+ case 4: case 7: cp[i]=cp0[i]+3+(cc&3); break;
+ case 2: case 5: cp0[i]=cp[i]=t[cxt[i]+cc&t.size()-1].get(cxt[i]>>16); break;
+ default:
+ {
+ cp0[i]=cp[i]=t[cxt[i]+cc&t.size()-1].get(cxt[i]>>16);
+ // Update pending bit histories for bits 2-7
+ if (cp0[i][3]==2) {
+ const int c=cp0[i][4]+256;
+ U8 *p=t[cxt[i]+(c>>6)&t.size()-1].get(cxt[i]>>16);
+ p[0]=1+((c>>5)&1);
+ p[1+((c>>5)&1)]=1+((c>>4)&1);
+ p[3+((c>>4)&3)]=1+((c>>3)&1);
+ p=t[cxt[i]+(c>>3)&t.size()-1].get(cxt[i]>>16);
+ p[0]=1+((c>>2)&1);
+ p[1+((c>>2)&1)]=1+((c>>1)&1);
+ p[3+((c>>1)&3)]=1+(c&1);
+ cp0[i][6]=0;
+ }
+ // Update run count of previous context
+ if (runp[i][0]==0) // new context
+ runp[i][0]=2, runp[i][1]=c1;
+ else if (runp[i][1]!=c1) // different byte in context
+ runp[i][0]=1, runp[i][1]=c1;
+ else if (runp[i][0]<254) // same byte in context
+ runp[i][0]+=2;
+ else if (runp[i][0]==255)
+ runp[i][0]=128;
+ runp[i]=cp0[i]+3;
+ } break;
+ }
+ }
+
+ // predict from last byte in context
+ if (runp[i][1]+256>>8-bp==cc) {
+ int rc=runp[i][0]; // count*2, +1 if 2 different bytes seen
+ int b=(runp[i][1]>>7-bp&1)*2-1; // predicted bit + for 1, - for 0
+ int c=ilog(rc+1)<<2+(~rc&1);
+ m.add(b*c);
+ }
+ else
+ m.add(0);
+
+ // predict from bit context
+ if(cp[i])
+ {
+ result+=mix2(m, *cp[i], sm[i]);
+ }
+ else
+ {
+ mix2(m, 0, sm[i]);
+ }
+
+
+ }
+ if (bp==7) cn=0;
+ return result;
+}
+
+//////////////////////////// Models //////////////////////////////
+
+// All of the models below take a Mixer as a parameter and write
+// predictions to it.
+
+//////////////////////////// matchModel ///////////////////////////
+
+// matchModel() finds the longest matching context and returns its length
+
+int matchModel(Mixer& m) {
+ const int MAXLEN=65534; // longest allowed match + 1
+ static Array<int> t(MEM); // hash table of pointers to contexts
+ static int h=0; // hash of last 7 bytes
+ static int ptr=0; // points to next byte of match if any
+ static int len=0; // length of match, or 0 if no match
+ static int result=0;
+
+ static SmallStationaryContextMap scm1(0x20000);
+
+ if (!bpos) {
+ h=h*997*8+buf(1)+1&t.size()-1; // update context hash
+ if (len) ++len, ++ptr;
+ else { // find match
+ ptr=t[h];
+ if (ptr && pos-ptr<buf.size())
+ while (buf(len+1)==buf[ptr-len-1] && len<MAXLEN) ++len;
+ }
+ t[h]=pos; // update hash table
+ result=len;
+// if (result>0 && !(result&0xfff)) printf("pos=%d len=%d ptr=%d\n", pos, len, ptr);
+ scm1.set(pos);
+ }
+
+ // predict
+ if (len)
+ {
+ if (buf(1)==buf[ptr-1] && c0==buf[ptr]+256>>8-bpos)
+ {
+ if (len>MAXLEN) len=MAXLEN;
+ if (buf[ptr]>>7-bpos&1)
+ {
+ m.add(ilog(len)<<2);
+ m.add(min(len, 32)<<6);
+ }
+ else
+ {
+ m.add(-(ilog(len)<<2));
+ m.add(-(min(len, 32)<<6));
+ }
+ }
+ else
+ {
+ len=0;
+ m.add(0);
+ m.add(0);
+ }
+ }
+ else
+ {
+ m.add(0);
+ m.add(0);
+ }
+
+ scm1.mix(m);
+ return result;
+}
+
+//////////////////////////// picModel //////////////////////////
+
+// Model a 1728 by 2376 2-color CCITT bitmap image, left to right scan,
+// MSB first (216 bytes per row, 513216 bytes total). Insert predictions
+// into m.
+
+void picModel(Mixer& m) {
+ static U32 r0, r1, r2, r3; // last 4 rows, bit 8 is over current pixel
+ static Array<U8> t(0x10200); // model: cxt -> state
+ const int N=3; // number of contexts
+ static int cxt[N]; // contexts
+ static StateMap sm[N];
+
+ // update the model
+ int i;
+ for ( i=0; i<N; ++i)
+ t[cxt[i]]=nex(t[cxt[i]],y);
+
+ // update the contexts (pixels surrounding the predicted one)
+ r0+=r0+y;
+ r1+=r1+((buf(215)>>(7-bpos))&1);
+ r2+=r2+((buf(431)>>(7-bpos))&1);
+ r3+=r3+((buf(647)>>(7-bpos))&1);
+ cxt[0]=r0&0x7|r1>>4&0x38|r2>>3&0xc0;
+ cxt[1]=0x100+(r0&1|r1>>4&0x3e|r2>>2&0x40|r3>>1&0x80);
+ cxt[2]=0x200+(r0&0x3f^r1&0x3ffe^r2<<2&0x7f00^r3<<5&0xf800);
+
+ // predict
+ for ( i=0; i<N; ++i)
+ m.add(stretch(sm[i].p(t[cxt[i]])));
+}
+
+//////////////////////////// wordModel /////////////////////////
+
+// Model English text (words and columns/end of line)
+
+void wordModel(Mixer& m) {
+ static U32 word0=0, word1=0, word2=0, word3=0, word4=0, word5=0; // hashes
+ static U32 text0=0; // hash stream of letters
+ static ContextMap cm(MEM*16, 20);
+ static int nl1=-3, nl=-2; // previous, current newline position
+
+ // Update word hashes
+ if (bpos==0) {
+ int c=c4&255;
+ if (c>='A' && c<='Z')
+ c+='a'-'A';
+ if (c>='a' && c<='z' || c>=128) {
+ word0=word0*263*32+c;
+ text0=text0*997*16+c;
+ }
+ else if (word0) {
+ word5=word4*23;
+ word4=word3*19;
+ word3=word2*17;
+ word2=word1*13;
+ word1=word0*11;
+ word0=0;
+ }
+ if (c==10) nl1=nl, nl=pos-1;
+ int col=min(255, pos-nl), above=buf[nl1+col]; // text column context
+ U32 h=word0*271+buf(1);
+
+ cm.set(h);
+ cm.set(word0);
+ cm.set(h+word1);
+ cm.set(word0+word1*31);
+ cm.set(h+word1+word2*29);
+ cm.set(text0&0xffffff);
+ cm.set(text0&0xfffff);
+
+ cm.set(h+word2);
+ cm.set(h+word3);
+ cm.set(h+word4);
+ cm.set(h+word5);
+ cm.set(buf(1)|buf(3)<<8|buf(5)<<16);
+ cm.set(buf(2)|buf(4)<<8|buf(6)<<16);
+
+ cm.set(h+word1+word3);
+ cm.set(h+word2+word3);
+
+ // Text column models
+ cm.set(col<<16|buf(1)<<8|above);
+ cm.set(buf(1)<<8|above);
+ cm.set(col<<8|buf(1));
+ cm.set(col);
+ }
+ cm.mix(m);
+}
+
+//////////////////////////// recordModel ///////////////////////
+
+// Model 2-D data with fixed record length. Also order 1-2 models
+// that include the distance to the last match.
+
+void recordModel(Mixer& m) {
+ static int cpos1[256] , cpos2[256], cpos3[256], cpos4[256];
+ static int wpos1[0x10000]; // buf(1..2) -> last position
+ static int rlen=2, rlen1=3, rlen2=4; // run length and 2 candidates
+ static int rcount1=0, rcount2=0; // candidate counts
+ static ContextMap cm(32768, 3), cn(32768/2, 3), co(32768*2, 3), cp(MEM, 3);
+
+ // Find record length
+ if (!bpos) {
+ int w=c4&0xffff, c=w&255, d=w>>8;
+#if 1
+ int r=pos-cpos1[c];
+ if (r>1 && r==cpos1[c]-cpos2[c]
+ && r==cpos2[c]-cpos3[c] && r==cpos3[c]-cpos4[c]
+ && (r>15 || (c==buf(r*5+1)) && c==buf(r*6+1))) {
+ if (r==rlen1) ++rcount1;
+ else if (r==rlen2) ++rcount2;
+ else if (rcount1>rcount2) rlen2=r, rcount2=1;
+ else rlen1=r, rcount1=1;
+ }
+ if (rcount1>15 && rlen!=rlen1) rlen=rlen1, rcount1=rcount2=0;
+ if (rcount2>15 && rlen!=rlen2) rlen=rlen2, rcount1=rcount2=0;
+
+ // Set 2 dimensional contexts
+ assert(rlen>0);
+#endif
+ cm.set(c<<8| (min(255, pos-cpos1[c])/4) );
+ cm.set(w<<9| llog(pos-wpos1[w])>>2);
+
+ cm.set(rlen|buf(rlen)<<10|buf(rlen*2)<<18);
+ cn.set(w|rlen<<8);
+ cn.set(d|rlen<<16);
+ cn.set(c|rlen<<8);
+
+ co.set(buf(1)<<8|min(255, pos-cpos1[buf(1)]));
+ co.set(buf(1)<<17|buf(2)<<9|llog(pos-wpos1[w])>>2);
+ int col=pos%rlen;
+ co.set(buf(1)<<8|buf(rlen));
+
+ //cp.set(w*16);
+ //cp.set(d*32);
+ //cp.set(c*64);
+ cp.set(rlen|buf(rlen)<<10|col<<18);
+ cp.set(rlen|buf(1)<<10|col<<18);
+ cp.set(col|rlen<<12);
+
+ // update last context positions
+ cpos4[c]=cpos3[c];
+ cpos3[c]=cpos2[c];
+ cpos2[c]=cpos1[c];
+ cpos1[c]=pos;
+ wpos1[w]=pos;
+ }
+ cm.mix(m);
+ cn.mix(m);
+ co.mix(m);
+ cp.mix(m);
+}
+
+
+//////////////////////////// sparseModel ///////////////////////
+
+// Model order 1-2 contexts with gaps.
+
+void sparseModel(Mixer& m, int seenbefore, int howmany) {
+ static ContextMap cm(MEM*2, 48);
+ static int mask = 0;
+
+ if (bpos==0) {
+
+ cm.set( c4&0x00f0f0f0);
+ cm.set((c4&0xf0f0f0f0)+1);
+ cm.set((c4&0x00f8f8f8)+2);
+ cm.set((c4&0xf8f8f8f8)+3);
+ cm.set((c4&0x00e0e0e0)+4);
+ cm.set((c4&0xe0e0e0e0)+5);
+ cm.set((c4&0x00f0f0ff)+6);
+
+ cm.set(seenbefore);
+ cm.set(howmany);
+ cm.set(c4&0x00ff00ff);
+ cm.set(c4&0xff0000ff);
+ cm.set(buf(1)|buf(5)<<8);
+ cm.set(buf(1)|buf(6)<<8);
+ cm.set(buf(3)|buf(6)<<8);
+ cm.set(buf(4)|buf(8)<<8);
+
+ for (int i=1; i<8; ++i) {
+ cm.set((buf(i+1)<<8)|buf(i+2));
+ cm.set((buf(i+1)<<8)|buf(i+3));
+ cm.set(seenbefore|buf(i)<<8);
+ }
+
+ int fl = 0;
+ if( c4&0xff != 0 ){
+ if( isalpha( c4&0xff ) ) fl = 1;
+ else if( ispunct( c4&0xff ) ) fl = 2;
+ else if( isspace( c4&0xff ) ) fl = 3;
+ else if( c4&0xff == 0xff ) fl = 4;
+ else if( c4&0xff < 16 ) fl = 5;
+ else if( c4&0xff < 64 ) fl = 6;
+ else fl = 7;
+ }
+ mask = (mask<<3)|fl;
+ cm.set(mask);
+ cm.set(mask<<8|buf(1));
+ cm.set(mask<<17|buf(2)<<8|buf(3));
+ cm.set(mask&0x1ff|((c4&0xf0f0f0f0)<<9));
+ }
+ cm.mix(m);
+}
+
+//////////////////////////// distanceModel ///////////////////////
+
+// Model for modelling distances between symbols
+
+void distanceModel(Mixer& m) {
+ static ContextMap cr(MEM, 3);
+ if( bpos == 0 ){
+ static int pos00=0,pos20=0,posnl=0;
+ int c=c4&0xff;
+ if(c==0x00)pos00=pos;
+ if(c==0x20)pos20=pos;
+ if(c==0xff||c=='\r'||c=='\n')posnl=pos;
+ cr.set(min(pos-pos00,255)|(c<<8));
+ cr.set(min(pos-pos20,255)|(c<<8));
+ cr.set(min(pos-posnl,255)|(c<<8)+234567);
+ }
+ cr.mix(m);
+}
+
+//////////////////////////// bmpModel /////////////////////////////////
+
+// Model a 24-bit color uncompressed .bmp or .tif file. Return
+// width in pixels if an image file is detected, else 0.
+
+// 32-bit little endian number at buf(i)..buf(i-3)
+inline U32 i4(int i) {
+ assert(i>3);
+ return buf(i)+256*buf(i-1)+65536*buf(i-2)+16777216*buf(i-3);
+}
+
+// 16-bit
+inline int i2(int i) {
+ assert(i>1);
+ return buf(i)+256*buf(i-1);
+}
+
+// Square buf(i)
+inline int sqrbuf(int i) {
+ assert(i>0);
+ return buf(i)*buf(i);
+}
+
+int bmpModel(Mixer& m) {
+ static int w=0; // width of image in bytes (pixels * 3)
+ static int eoi=0; // end of image
+ static U32 tiff=0; // offset of tif header
+ const int SC=0x20000;
+ static SmallStationaryContextMap scm1(SC), scm2(SC),
+ scm3(SC), scm4(SC), scm5(SC), scm6(SC), scm7(SC), scm8(SC), scm9(SC*2), scm10(SC);
+ static ContextMap cm(MEM*4, 13);
+
+ // Detect .bmp file header (24 bit color, not compressed)
+ if (!bpos && buf(54)=='B' && buf(53)=='M'
+ && i4(44)==54 && i4(40)==40 && i4(24)==0) {
+ w=(i4(36)+3&-4)*3; // image width
+ const int height=i4(32);
+ eoi=pos;
+ if (w<0x30000 && height<0x10000) {
+ eoi=pos+w*height; // image size in bytes
+ printf("BMP %dx%d ", w/3, height);
+ }
+ else
+ eoi=pos;
+ }
+
+ // Detect .tif file header (24 bit color, not compressed).
+ // Parsing is crude, won't work with weird formats.
+ if (!bpos) {
+ if (c4==0x49492a00) tiff=pos; // Intel format only
+ if (pos-tiff==4 && c4!=0x08000000) tiff=0; // 8=normal offset to directory
+ if (tiff && pos-tiff==200) { // most of directory should be read by now
+ int dirsize=i2(pos-tiff-4); // number of 12-byte directory entries
+ w=0;
+ int bpp=0, compression=0, width=0, height=0;
+ for (int i=tiff+6; i<pos-12 && --dirsize>0; i+=12) {
+ int tag=i2(pos-i); // 256=width, 257==height, 259: 1=no compression
+ // 277=3 samples/pixel
+ int tagfmt=i2(pos-i-2); // 3=short, 4=long
+ int taglen=i4(pos-i-4); // number of elements in tagval
+ int tagval=i4(pos-i-8); // 1 long, 1-2 short, or points to array
+ if ((tagfmt==3||tagfmt==4) && taglen==1) {
+ if (tag==256) width=tagval;
+ if (tag==257) height=tagval;
+ if (tag==259) compression=tagval; // 1 = no compression
+ if (tag==277) bpp=tagval; // should be 3
+ }
+ }
+ if (width>0 && height>0 && width*height>50 && compression==1
+ && (bpp==1||bpp==3))
+ eoi=tiff+width*height*bpp, w=width*bpp;
+ if (eoi>pos)
+ printf("TIFF %dx%dx%d ", width, height, bpp);
+ else
+ tiff=w=0;
+ }
+ }
+ if (pos>eoi) return w=0;
+
+ // Select nearby pixels as context
+ if (!bpos) {
+ assert(w>3);
+ int color=pos%3;
+ int mean=buf(3)+buf(w-3)+buf(w)+buf(w+3);
+ const int var=sqrbuf(3)+sqrbuf(w-3)+sqrbuf(w)+sqrbuf(w+3)-mean*mean/4>>2;
+ mean>>=2;
+ const int logvar=ilog(var);
+ int i=0;
+ cm.set(hash(++i, buf(3), color));
+ cm.set(hash(++i, buf(3), buf(1), color));
+ cm.set(hash(++i, buf(3), buf(1)>>2, buf(2)>>6, color));
+ cm.set(hash(++i, buf(w), color));
+ cm.set(hash(++i, buf(w), buf(1), color));
+ cm.set(hash(++i, buf(w), buf(1)>>2, buf(2)>>6, color));
+ cm.set(hash(++i, buf(3)+buf(w)>>3, buf(1)>>5, buf(2)>>5, color));
+ cm.set(hash(++i, buf(1), buf(2), color));
+ cm.set(hash(++i, buf(3), buf(1)-buf(4), color));
+ cm.set(hash(++i, buf(3)+buf(1)-buf(4), color));
+ cm.set(hash(++i, buf(w), buf(1)-buf(w+1), color));
+ cm.set(hash(++i, buf(w)+buf(1)-buf(w+1), color));
+ cm.set(hash(++i, mean, logvar>>5, color));
+ scm1.set(buf(3)+buf(w)-buf(w+3));
+ scm2.set(buf(3)+buf(w-3)-buf(w));
+ scm3.set(buf(3)*2-buf(6));
+ scm4.set(buf(w)*2-buf(w*2));
+ scm5.set(buf(w+3)*2-buf(w*2+6));
+ scm6.set(buf(w-3)*2-buf(w*2-6));
+ scm7.set(buf(w-3)+buf(1)-buf(w-2));
+ scm8.set(buf(w)+buf(w-3)-buf(w*2-3));
+ scm9.set(mean>>1|logvar<<1&0x180);
+ }
+
+ // Predict next bit
+ scm1.mix(m);
+ scm2.mix(m);
+ scm3.mix(m);
+ scm4.mix(m);
+ scm5.mix(m);
+ scm6.mix(m);
+ scm7.mix(m);
+ scm8.mix(m);
+ scm9.mix(m);
+ scm10.mix(m);
+ cm.mix(m);
+ return w;
+}
+
+void model8bit(Mixer& m, int w) {
+ const int SC=0x20000;
+ static SmallStationaryContextMap scm1(SC), scm2(SC),
+ scm3(SC), scm4(SC), scm5(SC), scm6(SC*2),scm7(SC);
+ static ContextMap cm(MEM*4, 32);
+
+ // Select nearby pixels as context
+ if (!bpos) {
+ assert(w>3);
+ int mean=buf(1)+buf(w-1)+buf(w)+buf(w+1);
+ const int var=sqrbuf(1)+sqrbuf(w-1)+sqrbuf(w)+sqrbuf(w+1)-mean*mean/4>>2;
+ mean>>=2;
+ const int logvar=ilog(var);
+ int i=0;
+ // 2 x
+ cm.set(hash(++i, buf(1)>>2, buf(w)>>2));
+ cm.set(hash(++i, buf(1)>>2, buf(2)>>2));
+ cm.set(hash(++i, buf(w)>>2, buf(w*2)>>2));
+ cm.set(hash(++i, buf(1)>>2, buf(w-1)>>2));
+ cm.set(hash(++i, buf(w)>>2, buf(w+1)>>2));
+ cm.set(hash(++i, buf(w+1)>>2, buf(w+2)>>2));
+ cm.set(hash(++i, buf(w+1)>>2, buf(w*2+2)>>2));
+ cm.set(hash(++i, buf(w-1)>>2, buf(w*2-2)>>2));
+ cm.set(hash(++i, buf(1)+buf(w)>>1));
+ cm.set(hash(++i, buf(1)+buf(2)>>1));
+ cm.set(hash(++i, buf(w)+buf(w*2)>>1));
+ cm.set(hash(++i, buf(1)+buf(w-1)>>1));
+ cm.set(hash(++i, buf(w)+buf(w+1)>>1));
+ cm.set(hash(++i, buf(w+1)+buf(w+2)>>1));
+ cm.set(hash(++i, buf(w+1)+buf(w*2+2)>>1));
+ cm.set(hash(++i, buf(w-1)+buf(w*2-2)>>1));
+
+ // 3 x
+ cm.set(hash(++i, buf(w)>>2, buf(1)>>2, buf(w-1)>>2));
+ cm.set(hash(++i, buf(w-1)>>2, buf(w)>>2, buf(w+1)>>2));
+ cm.set(hash(++i, buf(1)>>2, buf(w-1)>>2, buf(w*2-1)>>2));
+
+ // mixed
+ cm.set(hash(++i, buf(3)+buf(w)>>1, buf(1)>>2, buf(2)>>2));
+ cm.set(hash(++i, buf(2)+buf(1)>>1,buf(w)+buf(w*2)>>1,buf(w-1)>>2));
+ cm.set(hash(++i, buf(2)+buf(1)>>2,buf(w-1)+buf(w)>>2));
+ cm.set(hash(++i, buf(2)+buf(1)>>1,buf(w)+buf(w*2)>>1));
+ cm.set(hash(++i, buf(2)+buf(1)>>1,buf(w-1)+buf(w*2-2)>>1));
+ cm.set(hash(++i, buf(2)+buf(1)>>1,buf(w+1)+buf(w*2+2)>>1));
+ cm.set(hash(++i, buf(w)+buf(w*2)>>1,buf(w-1)+buf(w*2+2)>>1));
+ cm.set(hash(++i, buf(w-1)+buf(w)>>1,buf(w)+buf(w+1)>>1));
+ cm.set(hash(++i, buf(1)+buf(w-1)>>1,buf(w)+buf(w*2)>>1));
+ cm.set(hash(++i, buf(1)+buf(w-1)>>2,buf(w)+buf(w+1)>>2));
+
+ cm.set(hash(++i, (buf(1)-buf(w-1)>>1)+buf(w)>>2));
+ cm.set(hash(++i, (buf(w-1)-buf(w)>>1)+buf(1)>>2));
+ cm.set(hash(++i, -buf(1)+buf(w-1)+buf(w)>>2));
+
+ scm1.set(buf(1)+buf(w)>>1);
+ scm2.set(buf(1)+buf(w)-buf(w+1)>>1);
+ scm3.set(buf(1)*2-buf(2)>>1);
+ scm4.set(buf(w)*2-buf(w*2)>>1);
+ scm5.set(buf(1)+buf(w)-buf(w-1)>>1);
+ scm6.set(mean>>1|logvar<<1&0x180);
+ }
+
+ // Predict next bit
+ scm1.mix(m);
+ scm2.mix(m);
+ scm3.mix(m);
+ scm4.mix(m);
+ scm5.mix(m);
+ scm6.mix(m);
+ scm7.mix(m); // Amazingly but improves compression!
+ cm.mix(m);
+ //return w;
+}
+
+//////////////////////////// pgmModel /////////////////////////////////
+
+// Model a 8-bit grayscale uncompressed binary .pgm and 8-bit color
+// uncompressed .bmp images. Return width in pixels if an image file
+// is detected, else 0.
+
+#define ISWHITESPACE(i) (buf(i) == ' ' || buf(i) == '\t' || buf(i) == '\n' || buf(i) == '\r')
+#define ISCRLF(i) (buf(i) == '\n' || buf(i) == '\r')
+
+int pgmModel(Mixer& m) {
+ static int h = 0; // height of image in bytes (pixels)
+ static int w = 0; // width of image in bytes (pixels)
+ static int eoi = 0; // end of image
+ static int pgm = 0; // offset of pgm header
+ static int pgm_hdr[3]; // 0 - Width, 1 - Height, 2 - Max value
+ static int pgm_ptr; // which record in header should be parsed next
+ int isws; // is white space
+ char v_buf[32];
+ int v_ptr;
+ if (!bpos)
+ {
+ if(buf(3)=='P' && buf(2)=='5' && ISWHITESPACE(1)) // Detect PGM file
+ {
+ pgm = pos;
+ pgm_ptr = 0;
+ return w = 0; // PGM header just detected, not enough info to get header yet
+ }else
+ if(pgm && pgm_ptr!=3) // PGM detected, let's parse header records
+ {
+ for (int i = pgm; i<pos-1 && pgm_ptr<3; i++)
+ {
+ // Skip white spaces
+ while ((isws = ISWHITESPACE(pos-i)) && i<pos-1) i++;
+ if(isws) break; // buffer end is reached
+
+ // Skip comments
+ if(buf(pos-i)=='#')
+ {
+ do {
+ i++;
+ }while(!ISCRLF(pos-i) && i<pos-1);
+ }else
+ {
+ // Get header record as a string into v_buf
+ v_ptr = 0;
+ do {
+ v_buf[v_ptr++] = buf(pos-i);
+ i++;
+ }while(!(isws = ISWHITESPACE(pos-i)) && i<pos-1 && v_ptr<32);
+
+ if(isws)
+ {
+ pgm_hdr[pgm_ptr++] = atoi(v_buf);
+ pgm = i; // move pointer
+ }
+ }
+ }
+
+ // Header is finished, next byte is first pixel
+ if(pgm_ptr==3)
+ {
+ if(pgm_hdr[2] == 255 && pgm_hdr[0]>0 && pgm_hdr[1]>0)
+ {
+ w = pgm_hdr[0];
+ h = pgm_hdr[1];
+ eoi = pos+w*h;
+ printf("PGM %dx%d",w,h);
+ }
+ }
+ }
+ }
+ if (pos>eoi) return w=0;
+ model8bit(m,w);
+ static int col=0;
+ if (++col>=8) col=0; // reset after every 24 columns?
+ m.set(2, 8);
+ m.set(col, 8);
+ m.set(buf(w)+buf(1)>>4, 32);
+ m.set(c0, 256);
+ return w;
+}
+
+int bmpModel8(Mixer& m) {
+ static int h = 0; // height of image in bytes (pixels)
+ static int w = 0; // width of image in bytes (pixels)
+ static int eoi = 0; // end of image
+ static int col = 0;
+ static int ibmp=0,w1=0;
+ if (bpos==0) {
+ // 8-bit .bmp images data offset windows bmp compression bpp
+ if (/*buf(54)=='B' && buf(53)=='M' && */(i4(44)< 1079) && i4(40)==40 && i4(24)==0 && (buf(26)==8 /*| buf(26)==4)*/)){
+ /* if (buf(26)==4)
+ w1=i4(36)/2; // image width
+ else*/
+ w1=i4(36); // image width 8453632 -> 2079974
+ h=i4(32); // image height
+ ibmp=pos+i4(44)-54;
+ }
+ if (ibmp==pos) {
+ w=w1;
+ eoi=pos+w*h;
+ printf("BMP(8-bit) %dx%d",w,h);
+ ibmp=0;
+ }
+ }
+ if (pos>eoi) return w=0;
+ model8bit(m,w);
+ if (++col>=8) col=0; // reset after every 24 columns?
+ m.set(2, 8);
+ m.set(col, 8);
+ m.set(buf(w)+buf(1)>>4, 32);
+ m.set(c0, 256);
+ return w;
+}
+
+int rgbModel8(Mixer& m) {
+ int h = 0; // height of image in bytes (pixels)
+ static int w = 0; // width of image in bytes (pixels)
+ static int eoi = 0; // end of image
+ static int col = 0;
+ // for .rgb gray images
+ if (bpos==0) {
+ if (buf(507)==1 && buf(506)==218 && buf(505)==0 && i2(496)==1)
+ {
+ w=(buf(501)&255)*256|(buf(500)&255); // image width
+ h=(buf(499)&255)*256|(buf(498)&255); // image height
+ eoi=pos+w*h;
+ printf("RGB(8-bit) %dx%d",w,h);
+ }
+ }
+ if (pos>eoi) return w=0;
+ model8bit(m,w);
+ if (++col>=8) col=0; // reset after every 24 columns?
+ m.set(2, 8);
+ m.set(col, 8);
+ m.set(buf(w)+buf(1)>>4, 32);
+ m.set(c0, 256);
+ return w;
+}
+
+//////////////////////////// jpegModel /////////////////////////
+
+// Model JPEG. Return 1-257 if a JPEG file is detected or else 0.
+// Only the baseline and 8 bit extended Huffman coded DCT modes are
+// supported. The model partially decodes the JPEG image to provide
+// context for the Huffman coded symbols.
+
+// Print a JPEG segment at buf[p...] for debugging
+void dump(const char* msg, int p) {
+ printf("%s:", msg);
+ int len=buf[p+2]*256+buf[p+3];
+ for (int i=0; i<len+2; ++i)
+ printf(" %02X", buf[p+i]);
+ printf("\n");
+}
+
+// Detect invalid JPEG data. The proper response is to silently
+// fall back to a non-JPEG model.
+#define jassert(x) if (!(x)) { \
+/* printf("JPEG error at %d, line %d: %s\n", pos, __LINE__, #x); */ \
+ jpeg=0; \
+ return next_jpeg;}
+
+struct HUF {U32 min, max; int val;}; // Huffman decode tables
+ // huf[Tc][Th][m] is the minimum, maximum+1, and pointer to codes for
+ // coefficient type Tc (0=DC, 1=AC), table Th (0-3), length m+1 (m=0-15)
+
+void update_k(int v1, int v2, int &k1, int &k2) {
+ int a, b, c;
+ a=abs(v1*(k1-1)+v2*(8-(k1-1)))/8;
+ b=abs(v1*(k1+0)+v2*(8-(k1+0)))/8;
+ c=abs(v1*(k1+1)+v2*(8-(k1+1)))/8;
+ if (k1==0) a=b; else if (k1==8) c=b;
+ if (a<b && a<c) k2--;
+ if (c<a && c<b) k2++;
+ if (k2<-2) {k1--;k2=0;}
+ if (k2>+2) {k1++;k2=0;}
+}
+
+int jpegModel(Mixer& m) {
+
+ // State of parser
+ enum {SOF0=0xc0, SOF1, SOF2, SOF3, DHT, RST0=0xd0, SOI=0xd8, EOI, SOS, DQT,
+ DNL, DRI, APP0=0xe0, COM=0xfe, FF}; // Second byte of 2 byte codes
+ static int jpeg=0; // 1 if JPEG is header detected, 2 if image data
+ static int next_jpeg=0; // updated with jpeg on next byte boundary
+ static int app; // Bytes remaining to skip in APPx or COM field
+ static int sof=0, sos=0, data=0; // pointers to buf
+ static Array<int> ht(8); // pointers to Huffman table headers
+ static int htsize=0; // number of pointers in ht
+
+ // Huffman decode state
+ static U32 huffcode=0; // Current Huffman code including extra bits
+ static int huffbits=0; // Number of valid bits in huffcode
+ static int huffsize=0; // Number of bits without extra bits
+ static int rs=-1; // Decoded huffcode without extra bits. It represents
+ // 2 packed 4-bit numbers, r=run of zeros, s=number of extra bits for
+ // first nonzero code. huffcode is complete when rs >= 0.
+ // rs is -1 prior to decoding incomplete huffcode.
+ static int mcupos=0; // position in MCU (0-639). The low 6 bits mark
+ // the coefficient in zigzag scan order (0=DC, 1-63=AC). The high
+ // bits mark the block within the MCU, used to select Huffman tables.
+
+ // Decoding tables
+ static Array<HUF> huf(128); // Tc*64+Th*16+m -> min, max, val
+ static int mcusize=0; // number of coefficients in an MCU
+ static int linesize=0; // width of image in MCU
+ static int hufsel[2][10]; // DC/AC, mcupos/64 -> huf decode table
+ static Array<U8> hbuf(2048); // Tc*1024+Th*256+hufcode -> RS
+
+ // Image state
+ static Array<int> color(10); // block -> component (0-3)
+ static Array<int> pred(4); // component -> last DC value
+ static int dc=0; // DC value of the current block
+ static int width=0; // Image width in MCU
+ static int row=0, column=0; // in MCU (column 0 to width-1)
+ static Buf cbuf(0x20000); // Rotating buffer of coefficients, coded as:
+ // DC: level shifted absolute value, low 4 bits discarded, i.e.
+ // [-1023...1024] -> [0...255].
+ // AC: as an RS code: a run of R (0-15) zeros followed by an S (0-15)
+ // bit number, or 00 for end of block (in zigzag order).
+ // However if R=0, then the format is ssss11xx where ssss is S,
+ // xx is the first 2 extra bits, and the last 2 bits are 1 (since
+ // this never occurs in a valid RS code).
+ static int cpos=0; // position in cbuf
+ static U32 huff1=0, huff2=0, huff3=0, huff4=0; // hashes of last codes
+ static int rs1, rs2, rs3, rs4; // last 4 RS codes
+ static int ssum=0, ssum1=0, ssum2=0, ssum3=0, ssum4=0;
+ // sum of S in RS codes in block and last 4 values
+
+ static IntBuf cbuf2(0x20000);
+ static Array<int> adv_pred(7), sumu(8), sumv(8);
+ static Array<int> ls(10); // block -> distance to previous block
+ static Array<int> lcp(4), zpos(64);
+
+ //for parsing Quantization tables
+ static int dqt_state = -1, dqt_end = 0, qnum = 0;
+ static Array<U8> qtab(256); // table
+ static Array<int> qmap(10); // block -> table number
+
+ const static U8 zzu[64]={ // zigzag coef -> u,v
+ 0,1,0,0,1,2,3,2,1,0,0,1,2,3,4,5,4,3,2,1,0,0,1,2,3,4,5,6,7,6,5,4,
+ 3,2,1,0,1,2,3,4,5,6,7,7,6,5,4,3,2,3,4,5,6,7,7,6,5,4,5,6,7,7,6,7};
+ const static U8 zzv[64]={
+ 0,0,1,2,1,0,0,1,2,3,4,3,2,1,0,0,1,2,3,4,5,6,5,4,3,2,1,0,0,1,2,3,
+ 4,5,6,7,7,6,5,4,3,2,1,2,3,4,5,6,7,7,6,5,4,3,4,5,6,7,7,6,5,6,7,7};
+
+ // Be sure to quit on a byte boundary
+ if (!bpos) next_jpeg=jpeg>1;
+ if (bpos && !jpeg) return next_jpeg;
+ if (!bpos && app>0) --app;
+ if (app>0) return next_jpeg;
+ if (!bpos) {
+
+ // Parse. Baseline DCT-Huffman JPEG syntax is:
+ // SOI APPx... misc... SOF0 DHT... SOS data EOI
+ // SOI (= FF D8) start of image.
+ // APPx (= FF Ex) len ... where len is always a 2 byte big-endian length
+ // including the length itself but not the 2 byte preceding code.
+ // Application data is ignored. There may be more than one APPx.
+ // misc codes are DQT, DNL, DRI, COM (ignored).
+ // SOF0 (= FF C0) len 08 height width Nf [C HV Tq]...
+ // where len, height, width (in pixels) are 2 bytes, Nf is the repeat
+ // count (1 byte) of [C HV Tq], where C is a component identifier
+ // (color, 0-3), HV is the horizontal and vertical dimensions
+ // of the MCU (high, low bits, packed), and Tq is the quantization
+ // table ID (not used). An MCU (minimum compression unit) consists
+ // of 64*H*V DCT coefficients for each color.
+ // DHT (= FF C4) len [TcTh L1...L16 V1,1..V1,L1 ... V16,1..V16,L16]...
+ // defines Huffman table Th (1-4) for Tc (0=DC (first coefficient)
+ // 1=AC (next 63 coefficients)). L1..L16 are the number of codes
+ // of length 1-16 (in ascending order) and Vx,y are the 8-bit values.
+ // A V code of RS means a run of R (0-15) zeros followed by S (0-15)
+ // additional bits to specify the next nonzero value, negative if
+ // the first additional bit is 0 (e.g. code x63 followed by the
+ // 3 bits 1,0,1 specify 7 coefficients: 0, 0, 0, 0, 0, 0, 5.
+ // Code 00 means end of block (remainder of 63 AC coefficients is 0).
+ // SOS (= FF DA) len Ns [Cs TdTa]... 0 3F 00
+ // Start of scan. TdTa specifies DC/AC Huffman tables (0-3, packed
+ // into one byte) for component Cs matching C in SOF0, repeated
+ // Ns (1-4) times.
+ // EOI (= FF D9) is end of image.
+ // Huffman coded data is between SOI and EOI. Codes may be embedded:
+ // RST0-RST7 (= FF D0 to FF D7) mark the start of an independently
+ // compressed region.
+ // DNL (= FF DC) 04 00 height
+ // might appear at the end of the scan (ignored).
+ // FF 00 is interpreted as FF (to distinguish from RSTx, DNL, EOI).
+
+ // Detect JPEG (SOI, APPx)
+ if (!jpeg && buf(4)==FF && buf(3)==SOI && buf(2)==FF && buf(1)>>4==0xe) {
+ jpeg=1;
+ app=sos=sof=htsize=data=mcusize=linesize=0;
+ huffcode=huffbits=huffsize=mcupos=cpos=0, rs=-1;
+ memset(&huf[0], 0, huf.size()*sizeof(HUF));
+ memset(&pred[0], 0, pred.size()*sizeof(int));
+ }
+
+ // Detect end of JPEG when data contains a marker other than RSTx
+ // or byte stuff (00).
+ if (jpeg && data && buf(2)==FF && buf(1) && (buf(1)&0xf8)!=RST0) {
+ jassert(buf(1)==EOI);
+ jpeg=0;
+ }
+ if (!jpeg) return next_jpeg;
+
+ // Detect APPx or COM field
+ if (!data && !app && buf(4)==FF && (buf(3)>>4==0xe || buf(3)==COM))
+ app=buf(2)*256+buf(1)+2;
+
+ // Save pointers to sof, ht, sos, data,
+ if (buf(5)==FF && buf(4)==SOS) {
+ int len=buf(3)*256+buf(2);
+ if (len==6+2*buf(1) && buf(1) && buf(1)<=4) // buf(1) is Ns
+ sos=pos-5, data=sos+len+2, jpeg=2;
+ }
+ if (buf(4)==FF && buf(3)==DHT && htsize<8) ht[htsize++]=pos-4;
+ if (buf(4)==FF && buf(3)==SOF0) sof=pos-4;
+
+ // Parse Quantizazion tables
+ if (buf(4)==FF && buf(3)==DQT)
+ dqt_end=pos+buf(2)*256+buf(1)-1, dqt_state=0;
+ else if (dqt_state>=0) {
+ if (pos>=dqt_end)
+ dqt_state = -1;
+ else {
+ if (dqt_state%65==0)
+ qnum = buf(1);
+ else {
+ jassert(buf(1)>0);
+ jassert(qnum>=0 && qnum<4);
+ qtab[qnum*64+((dqt_state%65)-1)]=buf(1)-1;
+ }
+ dqt_state++;
+ }
+ }
+
+ // Restart
+ if (buf(2)==FF && (buf(1)&0xf8)==RST0) {
+ huffcode=huffbits=huffsize=mcupos=0, rs=-1;
+ memset(&pred[0], 0, pred.size()*sizeof(int));
+ }
+ }
+
+ {
+ // Build Huffman tables
+ // huf[Tc][Th][m] = min, max+1 codes of length m, pointer to byte values
+ if (pos==data && bpos==1) {
+ jassert(htsize>0);
+ int i;
+ for ( i=0; i<htsize; ++i) {
+ int p=ht[i]+4; // pointer to current table after length field
+ int end=p+buf[p-2]*256+buf[p-1]-2; // end of Huffman table
+ int count=0; // sanity check
+ while (p<end && end<pos && end<p+2100 && ++count<10) {
+ int tc=buf[p]>>4, th=buf[p]&15;
+ if (tc>=2 || th>=4) break;
+ jassert(tc>=0 && tc<2 && th>=0 && th<4);
+ HUF* h=&huf[tc*64+th*16]; // [tc][th][0];
+ int val=p+17; // pointer to values
+ int hval=tc*1024+th*256; // pointer to RS values in hbuf
+ int j;
+ for ( j=0; j<256; ++j) // copy RS codes
+ hbuf[hval+j]=buf[val+j];
+ int code=0;
+ for ( j=0; j<16; ++j) {
+ h[j].min=code;
+ h[j].max=code+=buf[p+j+1];
+ h[j].val=hval;
+ val+=buf[p+j+1];
+ hval+=buf[p+j+1];
+ code*=2;
+ }
+ p=val;
+ jassert(hval>=0 && hval<2048);
+ }
+ jassert(p==end);
+ }
+ huffcode=huffbits=huffsize=0, rs=-1;
+
+ // Build Huffman table selection table (indexed by mcupos).
+ // Get image width.
+ if (!sof && sos) return next_jpeg;
+ int ns=buf[sos+4];
+ int nf=buf[sof+9];
+ jassert(ns<=4 && nf<=4);
+ mcusize=0; // blocks per MCU
+ int hmax=0; // MCU horizontal dimension
+ for ( i=0; i<ns; ++i) {
+ for (int j=0; j<nf; ++j) {
+ if (buf[sos+2*i+5]==buf[sof+3*j+10]) { // Cs == C ?
+ int hv=buf[sof+3*j+11]; // packed dimensions H x V
+ if (hv>>4>hmax) hmax=hv>>4;
+ hv=(hv&15)*(hv>>4); // number of blocks in component C
+ jassert(hv>=1 && hv+mcusize<=10);
+ while (hv) {
+ jassert(mcusize<10);
+ hufsel[0][mcusize]=buf[sos+2*i+6]>>4&15;
+ hufsel[1][mcusize]=buf[sos+2*i+6]&15;
+ jassert (hufsel[0][mcusize]<4 && hufsel[1][mcusize]<4);
+ color[mcusize]=i;
+ int tq=buf[sof+3*j+12]; // quantization table index (0..3)
+ jassert(tq>=0 && tq<4);
+ qmap[mcusize]=tq; // quantizazion table mapping
+ --hv;
+ ++mcusize;
+ }
+ }
+ }
+ }
+ jassert(hmax>=1 && hmax<=10);
+ int j;
+ for ( j=0; j<mcusize; ++j) {
+ ls[j]=0;
+ for (int i=1; i<mcusize; ++i) if (color[(j+i)%mcusize]==color[j]) ls[j]=i;
+ ls[j]=mcusize-ls[j]<<6;
+ }
+ for ( j=0; j<64; ++j) zpos[zzu[j]+8*zzv[j]]=j;
+ width=buf[sof+7]*256+buf[sof+8]; // in pixels
+ int height=buf[sof+5]*256+buf[sof+6];
+ printf("JPEG %dx%d ", width, height);
+ width=(width-1)/(hmax*8)+1; // in MCU
+ jassert(width>0);
+ mcusize*=64; // coefficients per MCU
+ row=column=0;
+ }
+ }
+
+
+ // Decode Huffman
+ {
+ if (mcusize && buf(1+(!bpos))!=FF) { // skip stuffed byte
+ jassert(huffbits<=32);
+ huffcode+=huffcode+y;
+ ++huffbits;
+ if (rs<0) {
+ jassert(huffbits>=1 && huffbits<=16);
+ const int ac=(mcupos&63)>0;
+ jassert(mcupos>=0 && (mcupos>>6)<10);
+ jassert(ac==0 || ac==1);
+ const int sel=hufsel[ac][mcupos>>6];
+ jassert(sel>=0 && sel<4);
+ const int i=huffbits-1;
+ jassert(i>=0 && i<16);
+ const HUF *h=&huf[ac*64+sel*16]; // [ac][sel];
+ jassert(h[i].min<=h[i].max && h[i].val<2048 && huffbits>0);
+ if (huffcode<h[i].max) {
+ jassert(huffcode>=h[i].min);
+ int k=h[i].val+huffcode-h[i].min;
+ jassert(k>=0 && k<2048);
+ rs=hbuf[k];
+ huffsize=huffbits;
+ }
+ }
+ if (rs>=0) {
+ if (huffsize+(rs&15)==huffbits) { // done decoding
+ huff4=huff3;
+ huff3=huff2;
+ huff2=huff1;
+ huff1=hash(huffcode, huffbits);
+ rs4=rs3;
+ rs3=rs2;
+ rs2=rs1;
+ rs1=rs;
+ int x=0; // decoded extra bits
+ if (mcupos&63) { // AC
+ if (rs==0) { // EOB
+ mcupos=mcupos+63&-64;
+ jassert(mcupos>=0 && mcupos<=mcusize && mcupos<=640);
+ while (cpos&63) {
+ cbuf2[cpos]=0;
+ cbuf[cpos++]=0;
+ }
+ }
+ else { // rs = r zeros + s extra bits for the next nonzero value
+ // If first extra bit is 0 then value is negative.
+ jassert((rs&15)<=10);
+ const int r=rs>>4;
+ const int s=rs&15;
+ jassert(mcupos>>6==mcupos+r>>6);
+ mcupos+=r+1;
+ x=huffcode&(1<<s)-1;
+ if (s && !(x>>s-1)) x-=(1<<s)-1;
+ for (int i=r; i>=1; --i) {
+ cbuf2[cpos]=0;
+ cbuf[cpos++]=i<<4|s;
+ }
+ cbuf2[cpos]=x;
+ cbuf[cpos++]=s<<4|huffcode<<2>>s&3|12;
+ ssum+=s;
+ }
+ }
+ else { // DC: rs = 0S, s<12
+ jassert(rs<12);
+ ++mcupos;
+ x=huffcode&(1<<rs)-1;
+ if (rs && !(x>>rs-1)) x-=(1<<rs)-1;
+ jassert(mcupos>=0 && mcupos>>6<10);
+ const int comp=color[mcupos>>6];
+ jassert(comp>=0 && comp<4);
+ dc=pred[comp]+=x;
+ jassert((cpos&63)==0);
+ cbuf2[cpos]=dc;
+ cbuf[cpos++]=dc+1023>>3;
+ ssum4=ssum3;
+ ssum3=ssum2;
+ ssum2=ssum1;
+ ssum1=ssum;
+ ssum=rs;
+ }
+ jassert(mcupos>=0 && mcupos<=mcusize);
+ if (mcupos>=mcusize) {
+ mcupos=0;
+ if (++column==width) column=0, ++row;
+ }
+ huffcode=huffsize=huffbits=0, rs=-1;
+
+
+ // UPDATE_ADV_PRED !!!!
+ {
+ const int acomp=mcupos>>6, q=64*qmap[acomp];
+ const int zz=mcupos&63, cpos_dc=cpos-zz;
+ const static int we[8]={181, 282, 353, 456, 568, 671, 742, 767};
+ static int sumu2[8], sumv2[8], sumu3[8], sumv3[8], kx[32];
+ if (zz == 0) {
+ for (int i=0; i<8; i++) {
+ update_k(sumv2[i], sumv3[i], kx[i], kx[i+16]);
+ update_k(sumu2[i], sumu3[i], kx[i+8], kx[i+24]);
+ sumu2[i]=sumv2[i]=sumu3[i]=sumv3[i]=0;
+ }
+ int cpos_dc_ls_acomp = cpos_dc-ls[acomp];
+ int cpos_dc_mcusize_width = cpos_dc-mcusize*width;
+ for (int i=0; i<64; i++) {
+ sumu2[zzu[i]]+=we[zzv[i]]*(zzv[i]&1?-1:+1)*(qtab[q+i]+1)*cbuf2[cpos_dc_mcusize_width+i];
+ sumv2[zzv[i]]+=we[zzu[i]]*(zzu[i]&1?-1:+1)*(qtab[q+i]+1)*cbuf2[cpos_dc_ls_acomp+i];
+ sumu3[zzu[i]]+=(zzv[i]?(zzv[i]&1?-256:256):181)*(qtab[q+i]+1)*cbuf2[cpos_dc_mcusize_width+i];
+ sumv3[zzv[i]]+=(zzu[i]?(zzu[i]&1?-256:256):181)*(qtab[q+i]+1)*cbuf2[cpos_dc_ls_acomp+i];
+ }
+ } else {
+ sumu2[zzu[zz-1]]-=we[zzv[zz-1]]*(qtab[q+zz-1]+1)*cbuf2[cpos-1];
+ sumv2[zzv[zz-1]]-=we[zzu[zz-1]]*(qtab[q+zz-1]+1)*cbuf2[cpos-1];
+ sumu3[zzu[zz-1]]-=(zzv[zz-1]?256:181)*(qtab[q+zz-1]+1)*cbuf2[cpos-1];
+ sumv3[zzv[zz-1]]-=(zzu[zz-1]?256:181)*(qtab[q+zz-1]+1)*cbuf2[cpos-1];
+ }
+ for (int i=0; i<8; ++i) {
+ int k=kx[i];
+ sumv[i]=(sumv2[i]*k+sumv3[i]*(8-k))/8;
+ k=kx[i+8];
+ sumu[i]=(sumu2[i]*k+sumu3[i]*(8-k))/8;
+ }
+
+ for (int i=0; i<3; ++i)
+ for (int st=0; st<8; ++st) {
+ const int zz2 = min(zz+st, 63);
+ int p=(sumu[zzu[zz2]]*i+sumv[zzv[zz2]]*(2-i))/2;
+ p/=(qtab[q+zz2]+1)*181;
+ if (zz2==0) p-=cbuf2[cpos_dc-ls[acomp]], p=(p<0?-1:+1)*ilog(14*abs(p)+1)/10;
+ else p=(p<0?-1:+1)*ilog(10*abs(p)+1)/10;
+ if (st==0) {
+ adv_pred[i]=p;
+ adv_pred[i+4]=p/4;
+ }
+ else if (abs(p)>abs(adv_pred[i])+1) {
+ adv_pred[i]+=st*2+(p>0)<<6;
+ if (abs(p/4)>abs(adv_pred[i+4])+1) adv_pred[i+4]+=st*2+(p>0)<<6;
+ break;
+ }
+ }
+ x=2*sumu[zzu[zz]]+2*sumv[zzv[zz]];
+ for (int i=0; i<8; ++i) {
+ if (zzu[zz]<i) x-=sumu[i];
+ if (zzv[zz]<i) x-=sumv[i];
+ }
+ x/=(qtab[q+zz]+1)*181;
+ if (zz==0) x-=cbuf2[cpos_dc-ls[acomp]];
+ adv_pred[3]=(x<0?-1:+1)*ilog(10*abs(x)+1)/10;
+
+ for (int i=0; i<4; ++i) {
+ const int a=(i&1?zzv[zz]:zzu[zz]), b=(i&2?2:1);
+ if (a<b) x=255;
+ else {
+ const int zz2=zpos[zzu[zz]+8*zzv[zz]-(i&1?8:1)*b];
+ x=(qtab[q+zz2]+1)*cbuf2[cpos_dc+zz2]/(qtab[q+zz]+1);
+ x=(x<0?-1:+1)*ilog(8*abs(x)+1)/8;
+ }
+ lcp[i]=x;
+ }
+ if (column==0) adv_pred[1]=adv_pred[2], adv_pred[0]=1;
+ if (row==0) adv_pred[1]=adv_pred[0], adv_pred[2]=1;
+ } // !!!!
+
+ }
+ }
+ }
+ }
+
+ // Estimate next bit probability
+ if (!jpeg || !data) return next_jpeg;
+ if (buf(1+(!bpos))==FF) {
+ m.add(128);
+ return 1;
+ }
+
+ // Context model
+ const int N=28; // size of t, number of contexts
+ static BH<9> t(MEM); // context hash -> bit history
+ // As a cache optimization, the context does not include the last 1-2
+ // bits of huffcode if the length (huffbits) is not a multiple of 3.
+ // The 7 mapped values are for context+{"", 0, 00, 01, 1, 10, 11}.
+ static Array<U32> cxt(N); // context hashes
+ static Array<U8*> cp(N); // context pointers
+ static StateMap sm[N];
+ static Mixer m1(32, 770, 3);
+ static APM a1(0x8000), a2(0x10000);
+
+
+ // Update model
+ if (cp[N-1]) {
+ for (int i=0; i<N; ++i)
+ *cp[i]=nex(*cp[i],y);
+ }
+ m1.update();
+
+ // Update context
+ const int comp=color[mcupos>>6];
+ const int coef=(mcupos&63)|comp<<6;
+ const int hc=(huffcode*2+(comp==0))|1<<(huffbits+1);
+ static int hbcount=2;
+ if (++hbcount>2 || huffbits==0) hbcount=0;
+ jassert(coef>=0 && coef<256);
+ const int zu=zzu[mcupos&63], zv=zzv[mcupos&63];
+ if (hbcount==0) {
+ int n=0;
+ cxt[0]=hash(++n, hc, coef, adv_pred[2]);
+ cxt[1]=hash(++n, hc, coef, adv_pred[0]);
+ cxt[2]=hash(++n, hc, coef, adv_pred[1]);
+ cxt[3]=hash(++n, hc, rs1, adv_pred[2]);
+ cxt[4]=hash(++n, hc, rs1, adv_pred[0]);
+ cxt[5]=hash(++n, hc, rs1, adv_pred[1]);
+ cxt[6]=hash(++n, hc, adv_pred[2], adv_pred[0]);
+ cxt[7]=hash(++n, hc, cbuf[cpos-width*mcusize], adv_pred[3]);
+ cxt[8]=hash(++n, hc, cbuf[cpos-ls[mcupos>>6]], adv_pred[3]);
+ cxt[9]=hash(++n, hc, lcp[0], lcp[1], adv_pred[1]);
+ cxt[10]=hash(++n, hc, lcp[0], lcp[1], coef);
+ cxt[11]=hash(++n, hc, zu, lcp[0], lcp[2]/3);
+ cxt[12]=hash(++n, hc, zv, lcp[1], lcp[3]/3);
+ cxt[13]=hash(++n, hc, mcupos>>2, min(3, mcupos&63));
+ cxt[14]=hash(++n, hc, coef, column>>1);
+ cxt[15]=hash(++n, hc, column>>2, lcp[0]+256*(lcp[2]/3), lcp[1]+256*(lcp[3]/3));
+ cxt[16]=hash(++n, hc, ssum>>4, coef);
+ cxt[17]=hash(++n, hc, rs1, coef);
+ cxt[18]=hash(++n, hc, mcupos>>3, ssum3>>3, adv_pred[3]);
+ cxt[19]=hash(++n, hc, lcp[0]/3, lcp[1]/3, adv_pred[5]);
+ cxt[20]=hash(++n, hc, cbuf[cpos-width*mcusize], adv_pred[6]);
+ cxt[21]=hash(++n, hc, cbuf[cpos-ls[mcupos>>6]], adv_pred[4]);
+ cxt[22]=hash(++n, hc, adv_pred[2]);
+ cxt[23]=hash(n, hc, adv_pred[0]);
+ cxt[24]=hash(n, hc, adv_pred[1]);
+ cxt[25]=hash(++n, hc, zv, lcp[1], adv_pred[6]);
+ cxt[26]=hash(++n, hc, zu, lcp[0], adv_pred[4]);
+ cxt[27]=hash(++n, hc, lcp[0], lcp[1], adv_pred[3]);
+ }
+
+ // Predict next bit
+ m1.add(128);
+ assert(hbcount<=2);
+ switch(hbcount)
+ {
+ case 0: for (int i=0; i<N; ++i) cp[i]=t[cxt[i]]+1, m1.add(stretch(sm[i].p(*cp[i]))); break;
+ case 1: { int hc=1+(huffcode&1)*3; for (int i=0; i<N; ++i) cp[i]+=hc, m1.add(stretch(sm[i].p(*cp[i]))); } break;
+ default: { int hc=1+(huffcode&1); for (int i=0; i<N; ++i) cp[i]+=hc, m1.add(stretch(sm[i].p(*cp[i]))); } break;
+ }
+
+ m1.set(column==0, 2);
+ m1.set(coef, 256);
+ m1.set(hc&511, 512);
+ int pr=m1.p();
+ m.add(stretch(pr));
+ pr=a1.p(pr, hc&511|(adv_pred[1]&63)<<9, 1023);
+ pr=a2.p(pr, hc&255|coef<<8, 255);
+ m.add(stretch(pr));
+ return 2+(hc&255);
+}
+
+//////////////////////////// wavModel /////////////////////////////////
+
+// Model a 16/8-bit stereo/mono uncompressed .wav file. Return
+// number of channels and bits per sample if a wav file is detected, else 0.
+// Based on 'An asymptotically Optimal Predictor for Stereo Lossless Audio Compression'
+// by Florin Ghido.
+
+ static int S,D;
+ static int wmode;
+
+inline U32 c(int b, int i1=0, int i2=0, int i3=0, int i4=0) {
+ int c;
+ c=buf(i1)>>(8-b);
+ if (i2) c=c<<b|buf(i2)>>(8-b);
+ if (i3) c=c<<b|buf(i3)>>(8-b);
+ if (i4) c=c<<b|buf(i4)>>(8-b);
+ return c;
+ }
+
+inline int s2(int i) {
+ return int(short(buf(i)+256*buf(i-1)));
+}
+
+inline int X(int i, int j) {
+ if (wmode==18) {
+ if (i<=S) return s2(i+j<<2); else return s2((i+j-S<<2)-2);
+ }
+ else if (wmode==17) return s2(i+j<<1);
+ else if (wmode==10) {
+ if (i<=S) return buf(i+j<<1); else return buf((i+j-S<<1)-1);
+ }
+ else return buf(i+j);
+}
+
+int wavModel(Mixer& m) {
+ static int channels; // number of channels
+ static int bits; // bits per sample
+ static int bytes; // bytes per sample
+ static int eof=0; // end of wav
+ static int s=0; // size in bytes
+ static int w,K=128>>(level-1);
+ static int pr[4][2], n[2], counter[2];
+ int chn,ch,msb,j,k,l,i=0;
+ double sum,a=0.996;
+ double F[49][49][2],L[49][49];
+ const int SC=0x20000;
+ static SmallStationaryContextMap scm1(SC), scm2(SC), scm3(SC), scm4(SC), scm5(SC), scm6(SC), scm7(SC), scm8(SC);
+ static ContextMap cm(MEM*4, 10);
+
+ // Detect .wav file header
+ if (!bpos && buf(8)=='d' && buf(7)=='a' && buf(6)=='t' && buf(5)=='a') {
+ for (int i=32; i<=1000; i++)
+ if (buf(i)=='f' && buf(i-1)=='m' && buf(i-2)=='t' && buf(i-3)==' ' && (i2(i-8)==1||i2(i-8)==65534)) {
+ bits=buf(i-22);
+ bytes=bits+7>>3;
+ channels=buf(i-10);
+ w=channels*bytes;
+ s=i4(4);
+ if ((channels==1 || channels==2) && (bits==8 || bits==16)) {
+ eof=pos+s;
+ for (int j=0; j<channels; j++) {
+ for (k=0; k<=S+D; k++) for (l=k; l<=S+D; l++) F[k][l][j]=0;
+ F[1][0][j]=1;
+ n[j]=counter[j]=0;
+ }
+ wmode=channels+bits;
+ printf("WAV %ibits/",bits);
+ if (channels==1) {printf("mono "); S=48; D=0;}
+ else {printf("stereo "); S=36; D=12;}
+ }
+ else eof=pos;
+ }
+ }
+ if (pos>eof) return bits=channels=0;
+
+ // Select previous samples and predicted sample as context
+ if (!bpos) {
+ msb=(pos+s-eof)%bytes;
+ ch=(pos+s-eof)%w;
+ chn=ch/bytes;
+ if (!msb) {
+ for (l=0; l<=S+D; l++) if (l<counter[chn]||(l-S-1>=0&&l-S-1<counter[chn])) F[0][l][chn]=F[0][l][chn]*a+X(0,1)*X(l,1);
+ if (channels==2) {
+ for (l=S+1; l<=S+D; l++) if (l-S-1<counter[chn]) F[S+1][l][chn]=F[S+1][l][chn]*a+X(S+1,1)*X(l,1);
+ for (k=1; k<=S; k++) if (k<counter[chn]) F[k][S+1][chn]=F[k][S+1][chn]*a+X(k,1)*X(S+1,1);
+ }
+ if (++n[chn]==K) {
+ if (channels==1) for (k=1; k<=S+D; k++) for (l=k; l<=S+D; l++) F[k][l][chn]=(F[k-1][l-1][chn]-X(k-1,1)*X(l-1,1))/a;
+ else for (k=1; k<=S+D; k++) if (k!=S+1) for (l=k; l<=S+D; l++) if (l!=S+1) F[k][l][chn]=(F[k-1][l-1][chn]-X(k-1,1)*X(l-1,1))/a;
+ for (i=1; i<=S+D; i++) {
+ sum=F[i][i][chn];
+ for (k=1; k<i; k++) sum-=L[i][k]*L[i][k];
+ if (sum>0) {
+ L[i][i]=sqrt(sum);
+ for (j=(i+1); j<=S+D; j++) {
+ sum=F[i][j][chn];
+ for (k=1; k<i; k++) sum-=L[j][k]*L[i][k];
+ L[j][i]=sum/L[i][i];
+ }
+ }
+ else break;
+ }
+ if (i>S+D && counter[chn]>S+1) {
+ for (k=1; k<=S+D; k++) {
+ F[k][0][chn]=F[0][k][chn];
+ for (j=1; j<k; j++) F[k][0][chn]-=L[k][j]*F[j][0][chn];
+ F[k][0][chn]/=L[k][k];
+ }
+ for (k=S+D; k>0; k--) {
+ for (j=k+1; j<=S+D; j++) F[k][0][chn]-=L[j][k]*F[j][0][chn];
+ F[k][0][chn]/=L[k][k];
+ }
+ }
+ n[chn]=0;
+ }
+ sum=0;
+ for (l=1; l<=S+D; l++) sum+=F[l][0][chn]*X(l,0);
+ pr[3][chn]=pr[2][chn];
+ pr[2][chn]=pr[1][chn];
+ pr[1][chn]=pr[0][chn];
+ pr[0][chn]=int(floor(sum));
+ counter[chn]++;
+ i=0;
+ cm.set(hash(++i, buf(1), ch));
+ cm.set(hash(++i, buf(1), buf(2), ch));
+ cm.set(hash(++i, buf(1), buf(2)>>3, buf(3), ch));
+ cm.set(hash(++i, s2(4)+s2(2)-s2(6)&0xff, ch));
+ cm.set(hash(++i, pr[0][chn]&0xff, ch));
+ cm.set(hash(++i, pr[0][chn]+s2(w)-pr[1][chn]&0xff ,ch));
+ cm.set(hash(++i, pr[0][chn]&0xff, (s2(w)-pr[1][chn]+s2(w*2)-pr[2][chn]>>1)&0xff, ch));
+ cm.set(hash(++i, pr[0][chn]+s2(w)*2-pr[1][chn]*2-s2(w*2)+pr[2][chn]&0xff, ch));
+ scm1.set(s2(w)&0x1ff);
+ scm2.set(s2(w)*2-s2(w*2)&0x1ff);
+ scm3.set(s2(w)*3-s2(w*2)*3+s2(w*3)&0x1ff);
+ }
+ else {
+ cm.set(hash(++i, buf(1), ch));
+ cm.set(hash(++i, buf(1)>>7, buf(2), buf(3)>>7, ch));
+ cm.set(hash(++i, c(7, w,w*2,w*3,w*4), ch));
+ cm.set(hash(++i, c(5, w,w*2,w*3,w*4), c(5, w*5,w*6), ch));
+ cm.set(hash(++i, c(4, w,w*2,w*3,w*4), c(3, w*5,w*6,w*7,w*8), c(2, w*9,w*10,w*11,w*12), ch));
+ cm.set(hash(++i, c(2, w,w*2,w*3,w*4)<<8|c(2, w*5,w*6,w*7,w*8), c(2, w*9,w*10,w*11,w*12)<<8|c(2, w*13,w*14,w*15,w*16), c(2, w*17,w*18,w*19,w*20)<<8|c(2, w*21,w*22,w*23,w*24), ch));
+ cm.set(hash(++i, pr[0][chn]>>8, ch));
+ cm.set(hash(++i, pr[0][chn]+s2(w+1)-pr[1][chn]>>8 ,ch));
+ cm.set(hash(++i, pr[0][chn]>>8, s2(w+1)-pr[1][chn]+s2(w*2+1)-pr[2][chn]>>9, ch));
+ cm.set(hash(++i, pr[0][chn]+s2(w+1)*2-pr[1][chn]*2-s2(w*2+1)+pr[2][chn]>>8, ch));
+ scm1.set(s2(5)+s2(3)-s2(7)-buf(1)+pr[0][chn]>>9);
+ scm2.set(s2(w+1)-buf(1)+pr[0][chn]>>9);
+ scm3.set(s2(w+1)*2-s2(w*2+1)-buf(1)+pr[0][chn]>>8);
+ scm4.set(s2(w+1)*3-s2(w*2+1)*3+s2(w*3+1)-buf(1)>>7);
+ scm5.set(s2(w-1)+s2(w+1)-buf(1)+pr[0][chn]*2>>10);
+ scm7.set(s2(w+1)*4-s2(w*2+1)*6+s2(w*3+1)*4-s2(w*4+1)-buf(1)>>7);
+ scm8.set(s2(w+1)*5-s2(w*2+1)*10+s2(w*3+1)*10-s2(w*4+1)*5+s2(w*5+1)-buf(1)+pr[0][chn]>>9);
+ }
+ }
+
+ // Predict next bit
+ scm1.mix(m);
+ scm2.mix(m);
+ scm3.mix(m);
+ scm4.mix(m);
+ scm5.mix(m);
+ scm6.mix(m);
+ scm7.mix(m);
+ scm8.mix(m);
+ cm.mix(m);
+ return channels<<8|bits;
+}
+
+//////////////////////////// exeModel /////////////////////////
+
+// Model x86 code. The contexts are sparse containing only those
+// bits relevant to parsing (2 prefixes, opcode, and mod and r/m fields
+// of modR/M byte).
+
+// Get context at buf(i) relevant to parsing 32-bit x86 code
+U32 execxt(int i, int x=0) {
+ int prefix=(buf(i+2)==0x0f)+2*(buf(i+2)==0x66)+3*(buf(i+2)==0x67)
+ +4*(buf(i+3)==0x0f)+8*(buf(i+3)==0x66)+12*(buf(i+3)==0x67);
+ int opcode=buf(i+1);
+ int modrm=i ? buf(i)&0xc7 : 0;
+ return prefix|opcode<<4|modrm<<12|x<<20;
+}
+
+void exeModel(Mixer& m) {
+ const int N=12;
+ static ContextMap cm(MEM, N);
+ if (!bpos) {
+ for (int i=0; i<N; ++i)
+ cm.set(execxt(i, buf(1)*(i>4)));
+ }
+ cm.mix(m);
+}
+
+//////////////////////////// indirectModel /////////////////////
+
+// The context is a byte string history that occurs within a
+// 1 or 2 byte context.
+
+void indirectModel(Mixer& m) {
+ static ContextMap cm(MEM, 6);
+ static U32 t1[256];
+ static U16 t2[0x10000];
+
+ if (!bpos) {
+ U32 d=c4&0xffff, c=d&255;
+ U32& r1=t1[d>>8];
+ r1=r1<<8|c;
+ U16& r2=t2[c4>>8&0xffff];
+ r2=r2<<8|c;
+ U32 t=c|t1[c]<<8;
+ cm.set(t&0xffff);
+ cm.set(t&0xffffff);
+ cm.set(t);
+ cm.set(t&0xff00);
+ t=d|t2[d]<<16;
+ cm.set(t&0xffffff);
+ cm.set(t);
+
+ }
+ cm.mix(m);
+}
+
+//////////////////////////// dmcModel //////////////////////////
+
+// Model using DMC. The bitwise context is represented by a state graph,
+// initilaized to a bytewise order 1 model as in
+// http://plg.uwaterloo.ca/~ftp/dmc/dmc.c but with the following difference:
+// - It uses integer arithmetic.
+// - The threshold for cloning a state increases as memory is used up.
+// - Each state maintains both a 0,1 count and a bit history (as in a
+// context model). The 0,1 count is best for stationary data, and the
+// bit history for nonstationary data. The bit history is mapped to
+// a probability adaptively using a StateMap. The two computed probabilities
+// are combined.
+// - When memory is used up the state graph is reinitialized to a bytewise
+// order 1 context as in the original DMC. However, the bit histories
+// are not cleared.
+
+struct DMCNode { // 12 bytes
+ unsigned int nx[2]; // next pointers
+ U8 state; // bit history
+ unsigned int c0:12, c1:12; // counts * 256
+};
+
+void dmcModel(Mixer& m) {
+ static int top=0, curr=0; // allocated, current node
+ static Array<DMCNode> t(MEM*2); // state graph
+ static StateMap sm;
+ static int threshold=256;
+
+ // clone next state
+ if (top>0 && top<t.size()) {
+ int next=t[curr].nx[y];
+ int n=y?t[curr].c1:t[curr].c0;
+ int nn=t[next].c0+t[next].c1;
+ if (n>=threshold*2 && nn-n>=threshold*3) {
+ int r=n*4096/nn;
+ assert(r>=0 && r<=4096);
+ t[next].c0 -= t[top].c0 = t[next].c0*r>>12;
+ t[next].c1 -= t[top].c1 = t[next].c1*r>>12;
+ t[top].nx[0]=t[next].nx[0];
+ t[top].nx[1]=t[next].nx[1];
+ t[top].state=t[next].state;
+ t[curr].nx[y]=top;
+ ++top;
+ if (top==MEM*2) threshold=512;
+ if (top==MEM*3) threshold=768;
+ }
+ }
+
+ // Initialize to a bytewise order 1 model at startup or when flushing memory
+ if (top==t.size() && bpos==1) top=0;
+ if (top==0) {
+ assert(t.size()>=65536);
+ for (int i=0; i<256; ++i) {
+ for (int j=0; j<256; ++j) {
+ if (i<127) {
+ t[j*256+i].nx[0]=j*256+i*2+1;
+ t[j*256+i].nx[1]=j*256+i*2+2;
+ }
+ else {
+ t[j*256+i].nx[0]=(i-127)*256;
+ t[j*256+i].nx[1]=(i+1)*256;
+ }
+ t[j*256+i].c0=128;
+ t[j*256+i].c1=128;
+ }
+ }
+ top=65536;
+ curr=0;
+ threshold=256;
+ }
+
+ // update count, state
+ if (y) {
+ if (t[curr].c1<3800) t[curr].c1+=256;
+ }
+ else if (t[curr].c0<3800) t[curr].c0+=256;
+ t[curr].state=nex(t[curr].state, y);
+ curr=t[curr].nx[y];
+
+ // predict
+ const int pr1=sm.p(t[curr].state);
+ const int n1=t[curr].c1;
+ const int n0=t[curr].c0;
+ const int pr2=(n1+5)*4096/(n0+n1+10);
+ m.add(stretch(pr1));
+ m.add(stretch(pr2));
+}
+
+//////////////////////////// contextModel //////////////////////
+
+typedef enum {DEFAULT, JPEG, BMPFILE4, BMPFILE8, BMPFILE24, TIFFFILE,
+ PGMFILE, RGBFILE, EXE, TEXT} Filetype;
+
+// This combines all the context models with a Mixer.
+
+int contextModel2() {
+ static ContextMap cm(MEM*32, 9);
+ static RunContextMap rcm7(MEM), rcm9(MEM), rcm10(MEM);
+ static Mixer m(800, 3088, 7, 128);
+ static U32 cxt[16]; // order 0-11 contexts
+ static Filetype filetype=DEFAULT;
+ static int size=0; // bytes remaining in block
+// static const char* typenames[4]={"", "jpeg ", "exe ", "text "};
+
+ // Parse filetype and size
+ if (bpos==0) {
+ --size;
+ if (size==-1) filetype=(Filetype)buf(1);
+ if (size==-5) {
+ size=buf(4)<<24|buf(3)<<16|buf(2)<<8|buf(1);
+// if (filetype<=3) printf("(%s%d)", typenames[filetype], size);
+ if (filetype==EXE) size+=8;
+ }
+ }
+
+ m.update();
+ m.add(256);
+
+ // Test for special file types
+ int ismatch=ilog(matchModel(m)); // Length of longest matching context
+ int iswav=wavModel(m); // number of channels and bits per sample if WAV is detected, else 0
+ if (filetype==JPEG){
+ int isjpeg=jpegModel(m); // 1-257 if JPEG is detected, else 0
+ if (isjpeg) {
+ m.set(1, 8);
+ m.set(isjpeg-1, 257);
+ m.set(buf(1), 256);
+ return m.p();
+ }
+ }
+ if (filetype==BMPFILE24 || filetype==TIFFFILE){
+ int isbmp=bmpModel(m); // Image width (bytes) if BMP or TIFF detected, or 0
+ if (isbmp>0) {
+ static int col=0;
+ if (++col>=24) col=0;
+ m.set(2, 8);
+ m.set(col, 24);
+ m.set(buf(isbmp)+buf(3)>>4, 32);
+ m.set(c0, 256);
+ return m.p();
+ }
+ }
+ if (filetype==PGMFILE){
+ if (pgmModel(m)>0) return m.p(); // Image width (bytes) if PGM (P5,PGM_MAXVAL = 255) detected, or 0
+ }
+ if (filetype==BMPFILE8){
+ if (bmpModel8(m)>0) return m.p(); // Image width (bytes) if BMP8 detected, or 0
+ }
+if (filetype==RGBFILE){
+ if (rgbModel8(m)>0) return m.p(); // Image width (bytes) if RGB8 detected, or 0
+ }
+ if (iswav>0) {
+ int bits=iswav&0xff;
+ int tbits=(iswav>>8)*bits;
+ static int col=0;
+ if (++col>=tbits) col=0;
+ if (tbits!=bits) m.set(col, tbits);
+ m.set(col, bits);
+ m.set(c0, 256);
+ return m.p();
+ }
+
+ // Normal model
+ if (bpos==0) {
+ int i;
+ for ( i=15; i>0; --i) // update order 0-11 context hashes
+ cxt[i]=cxt[i-1]*257+(c4&255)+1;
+ for ( i=0; i<7; ++i)
+ cm.set(cxt[i]);
+ rcm7.set(cxt[7]);
+ cm.set(cxt[8]);
+ rcm9.set(cxt[10]);
+ rcm10.set(cxt[12]);
+ cm.set(cxt[14]);
+ }
+ int order=cm.mix(m);
+
+ rcm7.mix(m);
+ rcm9.mix(m);
+ rcm10.mix(m);
+
+ if (level>=4) {
+ sparseModel(m,ismatch,order);
+ distanceModel(m);
+ picModel(m);
+ recordModel(m);
+ wordModel(m);
+ indirectModel(m);
+ dmcModel(m);
+ if (filetype==EXE) exeModel(m);
+ }
+
+
+
+ order = order-2;
+ if(order<0) order=0;
+
+ U32 c1=buf(1), c2=buf(2), c3=buf(3), c;
+
+ m.set(c1+8, 264);
+ m.set(c0, 256);
+ m.set(order+8*(c4>>5&7)+64*(c1==c2)+128*(filetype==EXE), 256);
+ m.set(c2, 256);
+ m.set(c3, 256);
+ m.set(ismatch, 256);
+
+ if(bpos)
+ {
+ c=c0<<(8-bpos); if(bpos==1)c+=c3/2;
+ c=(min(bpos,5))*256+c1/32+8*(c2/32)+(c&192);
+ }
+ else c=c3/128+(c4>>31)*2+4*(c2/64)+(c1&240);
+ m.set(c, 1536);
+ int pr=m.p();
+ return pr;
+}
+
+
+//////////////////////////// Predictor /////////////////////////
+
+// A Predictor estimates the probability that the next bit of
+// uncompressed data is 1. Methods:
+// p() returns P(1) as a 12 bit number (0-4095).
+// update(y) trains the predictor with the actual bit (0 or 1).
+
+class Predictor {
+ int pr; // next prediction
+public:
+ Predictor();
+ int p() const {assert(pr>=0 && pr<4096); return pr;}
+ void update();
+};
+
+Predictor::Predictor(): pr(2048) {}
+
+void Predictor::update() {
+ static APM1 a(256), a1(0x10000), a2(0x10000), a3(0x10000),
+ a4(0x10000), a5(0x10000), a6(0x10000);
+
+ // Update global context: pos, bpos, c0, c4, buf
+ c0+=c0+y;
+ if (c0>=256) {
+ buf[pos++]=c0;
+ c4=(c4<<8)+c0-256;
+ c0=1;
+ }
+ bpos=(bpos+1)&7;
+
+ // Filter the context model with APMs
+ int pr0=contextModel2();
+
+ pr=a.p(pr0, c0);
+
+ int pr1=a1.p(pr0, c0+256*buf(1));
+ int pr2=a2.p(pr0, c0^hash(buf(1), buf(2))&0xffff);
+ int pr3=a3.p(pr0, c0^hash(buf(1), buf(2), buf(3))&0xffff);
+ pr0=pr0+pr1+pr2+pr3+2>>2;
+
+ pr1=a4.p(pr, c0+256*buf(1));
+ pr2=a5.p(pr, c0^hash(buf(1), buf(2))&0xffff);
+ pr3=a6.p(pr, c0^hash(buf(1), buf(2), buf(3))&0xffff);
+ pr=pr+pr1+pr2+pr3+2>>2;
+
+ pr=pr+pr0+1>>1;
+}
+
+//////////////////////////// Encoder ////////////////////////////
+
+// An Encoder does arithmetic encoding. Methods:
+// Encoder(COMPRESS, f) creates encoder for compression to archive f, which
+// must be open past any header for writing in binary mode.
+// Encoder(DECOMPRESS, f) creates encoder for decompression from archive f,
+// which must be open past any header for reading in binary mode.
+// code(i) in COMPRESS mode compresses bit i (0 or 1) to file f.
+// code() in DECOMPRESS mode returns the next decompressed bit from file f.
+// Global y is set to the last bit coded or decoded by code().
+// compress(c) in COMPRESS mode compresses one byte.
+// decompress() in DECOMPRESS mode decompresses and returns one byte.
+// flush() should be called exactly once after compression is done and
+// before closing f. It does nothing in DECOMPRESS mode.
+// size() returns current length of archive
+// setFile(f) sets alternate source to FILE* f for decompress() in COMPRESS
+// mode (for testing transforms).
+// If level (global) is 0, then data is stored without arithmetic coding.
+
+typedef enum {COMPRESS, DECOMPRESS} Mode;
+class Encoder {
+private:
+ Predictor predictor;
+ const Mode mode; // Compress or decompress?
+ FILE* archive; // Compressed data file
+ U32 x1, x2; // Range, initially [0, 1), scaled by 2^32
+ U32 x; // Decompress mode: last 4 input bytes of archive
+ FILE *alt; // decompress() source in COMPRESS mode
+
+ // Compress bit y or return decompressed bit
+ int code(int i=0) {
+ int p=predictor.p();
+ assert(p>=0 && p<4096);
+ p+=p<2048;
+ U32 xmid=x1 + (x2-x1>>12)*p + ((x2-x1&0xfff)*p>>12);
+ assert(xmid>=x1 && xmid<x2);
+ if (mode==DECOMPRESS) y=x<=xmid; else y=i;
+ y ? (x2=xmid) : (x1=xmid+1);
+ predictor.update();
+ while (((x1^x2)&0xff000000)==0) { // pass equal leading bytes of range
+ if (mode==COMPRESS) putc(x2>>24, archive);
+ x1<<=8;
+ x2=(x2<<8)+255;
+ if (mode==DECOMPRESS) x=(x<<8)+(getc(archive)&255); // EOF is OK
+ }
+ return y;
+ }
+
+public:
+ Encoder(Mode m, FILE* f);
+ Mode getMode() const {return mode;}
+ long size() const {return ftell(archive);} // length of archive so far
+ void flush(); // call this when compression is finished
+ void setFile(FILE* f) {alt=f;}
+
+ // Compress one byte
+ void compress(int c) {
+ assert(mode==COMPRESS);
+ if (level==0)
+ putc(c, archive);
+ else
+ for (int i=7; i>=0; --i)
+ code((c>>i)&1);
+ }
+
+ // Decompress and return one byte
+ int decompress() {
+ if (mode==COMPRESS) {
+ assert(alt);
+ return getc(alt);
+ }
+ else if (level==0)
+ return getc(archive);
+ else {
+ int c=0;
+ for (int i=0; i<8; ++i)
+ c+=c+code();
+ return c;
+ }
+ }
+};
+
+Encoder::Encoder(Mode m, FILE* f):
+ mode(m), archive(f), x1(0), x2(0xffffffff), x(0), alt(0) {
+ if (level>0 && mode==DECOMPRESS) { // x = first 4 bytes of archive
+ for (int i=0; i<4; ++i)
+ x=(x<<8)+(getc(archive)&255);
+ }
+ for (int i=0; i<1024; ++i)
+ dt[i]=16384/(i+i+3);
+
+}
+
+void Encoder::flush() {
+ if (mode==COMPRESS && level>0)
+ putc(x1>>24, archive); // Flush first unequal byte of range
+}
+
+/////////////////////////// Filters /////////////////////////////////
+//
+// Before compression, data is encoded in blocks with the following format:
+//
+// <type> <size> <encoded-data>
+//
+// Type is 1 byte (type Filetype): DEFAULT=0, JPEG, EXE
+// Size is 4 bytes in big-endian format.
+// Encoded-data decodes to <size> bytes. The encoded size might be
+// different. Encoded data is designed to be more compressible.
+//
+// void encode(FILE* in, FILE* out, int n);
+//
+// Reads n bytes of in (open in "rb" mode) and encodes one or
+// more blocks to temporary file out (open in "wb+" mode).
+// The file pointer of in is advanced n bytes. The file pointer of
+// out is positioned after the last byte written.
+//
+// en.setFile(FILE* out);
+// int decode(Encoder& en);
+//
+// Decodes and returns one byte. Input is from en.decompress(), which
+// reads from out if in COMPRESS mode. During compression, n calls
+// to decode() must exactly match n bytes of in, or else it is compressed
+// as type 0 without encoding.
+//
+// Filetype detect(FILE* in, int n, Filetype type);
+//
+// Reads n bytes of in, and detects when the type changes to
+// something else. If it does, then the file pointer is repositioned
+// to the start of the change and the new type is returned. If the type
+// does not change, then it repositions the file pointer n bytes ahead
+// and returns the old type.
+//
+// For each type X there are the following 2 functions:
+//
+// void encode_X(FILE* in, FILE* out, int n, ...);
+//
+// encodes n bytes from in to out.
+//
+// int decode_X(Encoder& en);
+//
+// decodes one byte from en and returns it. decode() and decode_X()
+// maintain state information using static variables.
+#define bswap(x) \
++ ((((x) & 0xff000000) >> 24) | \
++ (((x) & 0x00ff0000) >> 8) | \
++ (((x) & 0x0000ff00) << 8) | \
++ (((x) & 0x000000ff) << 24))
+
+// Detect EXE or JPEG data
+Filetype detect(FILE* in, int n, Filetype type) {
+ U32 buf1=0, buf0=0; // last 8 bytes
+ long start=ftell(in);
+
+ // For EXE detection
+ Array<int> abspos(256), // CALL/JMP abs. addr. low byte -> last offset
+ relpos(256); // CALL/JMP relative addr. low byte -> last offset
+ int e8e9count=0; // number of consecutive CALL/JMPs
+ int e8e9pos=0; // offset of first CALL or JMP instruction
+ int e8e9last=0; // offset of most recent CALL or JMP
+ // For BMP detection
+ int bmp=0,bmp0=0,bsize=0,imgbpp=0,bmpx=0,bmpy=0,bmpimgoff=0,imgcomp=-1;
+ // For PGM detection
+ int pgm=0,psize=0,pgmcomment=0,pgmw=0,pgmh=0,pgmsize=0,pgm_ptr=0,pgmc=0;
+ char pgm_buf[32];
+ // For JPEG detection
+ int soi=0, sof=0, sos=0, app=0; // position where found
+ // For .RGB detection
+ int rgbi=0,rgbSTORAGE=-1,rgbBPC=0,rgbDIMENSION=0,rgbZSIZE=0,rgbXSIZE=0,rgbYSIZE=0,rgbsize=0;
+ for (int i=0; i<n; ++i) {
+ int c=getc(in);
+ if (c==EOF) return (Filetype)(-1);
+ buf1=buf1<<8|buf0>>24;
+ buf0=buf0<<8|c;
+
+ // Detect JPEG by code SOI APPx (FF D8 FF Ex) followed by
+ // SOF0 (FF C0 xx xx 08) and SOS (FF DA) within a reasonable distance.
+ // Detect end by any code other than RST0-RST7 (FF D9-D7) or
+ // a byte stuff (FF 00).
+
+ if (!soi && i>=3 && (buf0&0xfffffff0)==0xffd8ffe0) soi=i, app=i+2;
+ if (soi) {
+ if (app==i && ((buf0&0xfff00000)==0xffe00000 || (buf0&0xffff0000)==0xfffe0000))
+ app=i+(buf0&0xffff)+2;
+ if (app<i && i-soi<0x10000 && (buf1&0xff)==0xff
+ && (buf0&0xff0000ff)==0xc0000008)
+ sof=i;
+ if (sof && sof>soi && i-soi<0x10000 && i-sof<0x1000
+ && (buf0&0xffff)==0xffda) {
+ sos=i;
+ if (type!=JPEG) return fseek(in, start+soi-3, SEEK_SET), JPEG;
+ }
+ }
+ if (type==JPEG && sos && i>sos && (buf0&0xff00)==0xff00
+ && (buf0&0xff)!=0 && (buf0&0xf8)!=0xd0)
+ return DEFAULT;
+
+ // Detect .bmp image
+
+ if ((buf0&0xFFFF)==16973) bmp=i; //possible 'BM'
+ if (bmp){
+ if ((i-bmp)==4) bsize=bswap(buf0); //image size
+ if ((i-bmp)==12) bmpimgoff=bswap(buf0);
+ if ((i-bmp)==16 && buf0!=0x28000000) bmp=imgbpp=bsize=0,imgcomp=-1; //if windows bmp
+ if ((i-bmp)==20){
+ bmpx=bswap(buf0); //image x size
+ if (bmpx==0) bmp=imgbpp=bsize=0,imgcomp=-1; // Test big size?
+ }
+ if ((i-bmp)==24){
+ bmpy=bswap(buf0); //image y size
+ if (bmpy==0) bmp=imgbpp=bsize=0,imgcomp=-1;
+ }
+ if ((i-bmp)==27) imgbpp=c; //image bpp
+ if ((i-bmp)==31){
+ imgcomp=buf0; //image compression 0=none, 1=RLE-8, 2=RLE-4
+ if (imgcomp!=0) bmp=imgbpp=bsize=0,imgcomp=-1;}
+ if ((type==BMPFILE4 || type==BMPFILE8 || type==BMPFILE24 ) && (imgbpp==4 || imgbpp==8 || imgbpp==24) && imgcomp==0){
+ int tbsize=0;
+ if (imgbpp==4)
+ if (bsize !=(tbsize=((bmpx*bmpy>>1)+bmpimgoff))) bsize=tbsize;
+ if (imgbpp==8)
+ if (bsize !=(tbsize=(bmpx*bmpy+bmpimgoff))) bsize=tbsize;
+ if (imgbpp==24)
+ if (bsize !=(tbsize=(bmpx*bmpy*3+bmpimgoff))) bsize=tbsize;
+ return fseek(in, start+bsize, SEEK_SET),DEFAULT;
+ }
+ if (imgbpp==4 && imgcomp==0){
+ return fseek(in, start+bmp-1, SEEK_SET),BMPFILE4;
+ }
+ if (imgbpp==8 && imgcomp==0){
+ return fseek(in, start+bmp-1, SEEK_SET),BMPFILE8;
+ }
+ if (imgbpp==24 && imgcomp==0){
+ return fseek(in, start+bmp-1, SEEK_SET),BMPFILE24;
+ }
+ }
+ // Detect .pgm image
+ if ((buf0&0xFFFFFF)==0x50350A) pgm=i; //possible 'P5 '
+ if (pgm){
+ if ((i-pgm)==1 && c==0x23) pgmcomment=1; //pgm comment
+ //not tested without comment
+ if (!pgmcomment && c==0x20 && !pgmw && pgm_ptr) {
+ pgm_buf[pgm_ptr++]=0;
+ pgmw=atoi(pgm_buf);
+ if (pgmw==0) pgm=pgm_ptr=pgmw=pgmh=pgmc=pgmcomment=0;
+ pgm_ptr=0;
+ }
+ if (!pgmcomment && c==0x0a && !pgmh && pgm_ptr){
+ pgm_buf[pgm_ptr++]=0;
+ pgmh=atoi(pgm_buf);
+ if (pgmh==0) pgm=pgm_ptr=pgmw=pgmh=pgmc=pgmcomment=0;
+ pgm_ptr=0;
+ }
+ if (!pgmcomment && c==0x0a && !pgmc && pgm_ptr){
+ pgm_buf[pgm_ptr++]=0;
+ pgmc=atoi(pgm_buf);
+ pgm_ptr=0;
+ }
+ if (!pgmcomment) pgm_buf[pgm_ptr++]=c;
+ if (pgm_ptr>=32) pgm=pgm_ptr=pgmw=pgmh=pgmc=pgmcomment=0;
+ if (pgmcomment && c==0x0a) pgmcomment=0;
+ if (type==PGMFILE && pgmw && pgmh && pgmc){
+ pgmsize=pgmw *pgmh +pgm+i-1;
+ return fseek(in, start+pgmsize, SEEK_SET),DEFAULT;
+ }
+ if (pgmw && pgmh && pgmc){
+ return fseek(in, start+pgm-2, SEEK_SET),PGMFILE;
+ }
+ }
+ // Detect .rgb image
+ if ((buf0&0xFFFF)==0x01DA) rgbi=i;
+ if (rgbi){
+ if ((i-rgbi)==1)
+ if (c==0 || c==1)
+ rgbSTORAGE=c; //0 uncompressed, 1 RLE compressed
+ else
+ rgbi=rgbBPC=rgbDIMENSION=rgbZSIZE=rgbXSIZE=rgbYSIZE=0,rgbSTORAGE=-1;
+ if ((i-rgbi)==2)
+ if (c==1 || c==2)
+ rgbBPC=c;
+ else
+ rgbi=rgbBPC=rgbDIMENSION=rgbZSIZE=rgbXSIZE=rgbYSIZE=0,rgbSTORAGE=-1;
+ if ((i-rgbi)==4)
+ if ((buf0&0xFFFF)==1 || (buf0&0xFFFF)==2 || (buf0&0xFFFF)==3)
+ rgbDIMENSION=buf0&0xFFFF;
+ else
+ rgbi=rgbBPC=rgbDIMENSION=rgbZSIZE=rgbXSIZE=rgbYSIZE=0,rgbSTORAGE=-1;
+ if ((i-rgbi)==6)
+ if ((buf0&0xFFFF)>0)
+ rgbXSIZE=buf0&0xFFFF;
+ else
+ rgbi=rgbBPC=rgbDIMENSION=rgbZSIZE=rgbXSIZE=rgbYSIZE=0,rgbSTORAGE=-1;
+ if ((i-rgbi)==8)
+ if ((buf0&0xFFFF)>0)
+ rgbYSIZE=buf0&0xFFFF,rgbsize=rgbYSIZE*rgbXSIZE+512;
+ else
+ rgbi=rgbBPC=rgbDIMENSION=rgbZSIZE=rgbXSIZE=rgbYSIZE=0,rgbSTORAGE=-1;
+ if ((i-rgbi)==10)
+ if ((buf0&0xFFFF)==1 || (buf0&0xFFFF)==3 || (buf0&0xFFFF)==4) // 1 indicates greyscale
+ // 3 indicates RGB
+ // 4 indicates RGB and Alpha
+ rgbZSIZE=buf0&0xFFFF;
+ else
+ rgbi=rgbBPC=rgbDIMENSION=rgbZSIZE=rgbXSIZE=rgbYSIZE=0,rgbSTORAGE=-1;
+ if (rgbsize != 0 && (i-rgbi)>0 && ((i-rgbi)>rgbsize)){
+ if (type==RGBFILE && rgbZSIZE==1 && rgbSTORAGE==0 ){ //uncompressed greyscale
+ return fseek(in, start+rgbsize, SEEK_SET),DEFAULT;
+ }
+ if (rgbZSIZE==1 && rgbSTORAGE==0){
+ return fseek(in, start+rgbi-1, SEEK_SET),RGBFILE;
+ }
+ }
+ }
+ //TIFF support needed
+ // Detect .tiff file
+
+
+ // Detect EXE if the low order byte (little-endian) XX is more
+ // recently seen (and within 4K) if a relative to absolute address
+ // conversion is done in the context CALL/JMP (E8/E9) XX xx xx 00/FF
+ // 4 times in a row. Detect end of EXE at the last
+ // place this happens when it does not happen for 64KB.
+
+ if ((buf1&0xfe)==0xe8 && (buf0+1&0xfe)==0) {
+ int r=buf0>>24; // relative address low 8 bits
+ int a=(buf0>>24)+i&0xff; // absolute address low 8 bits
+ int rdist=i-relpos[r];
+ int adist=i-abspos[a];
+ if (adist<rdist && adist<0x1000 && abspos[a]>5) {
+ e8e9last=i;
+ ++e8e9count;
+ if (e8e9pos==0 || e8e9pos>abspos[a]) e8e9pos=abspos[a];
+ }
+ else e8e9count=0;
+ if (type!=EXE && e8e9count>=4 && e8e9pos>5)
+ return fseek(in, start+e8e9pos-5, SEEK_SET), EXE;
+ abspos[a]=i;
+ relpos[r]=i;
+ }
+ if (type==EXE && i-e8e9last>0x1000)
+ return fseek(in, start+e8e9last, SEEK_SET), DEFAULT;
+ }
+ return type;
+}
+
+// Default encoding as self
+void encode_default(FILE* in, FILE* out, int len) {
+ while (len--) putc(getc(in), out);
+}
+
+int decode_default(Encoder& en) {
+ return en.decompress();
+}
+
+// JPEG encode as self. The purpose is to shield jpegs from exe transform.
+void encode_jpeg(FILE* in, FILE* out, int len) {
+ while (len--) putc(getc(in), out);
+}
+
+int decode_jpeg(Encoder& en) {
+ return en.decompress();
+}
+// BMP encode as self.
+void encode_bmp(FILE* in, FILE* out, int len) {
+ while (len--) putc(getc(in), out);
+}
+
+int decode_bmp(Encoder& en) {
+ return en.decompress();
+}
+
+// PGM encode as self.
+void encode_pgm(FILE* in, FILE* out, int len) {
+ while (len--) putc(getc(in), out);
+}
+
+int decode_pgm(Encoder& en) {
+ return en.decompress();
+}
+
+// RGB encode as self.
+void encode_rgb(FILE* in, FILE* out, int len) {
+ while (len--) putc(getc(in), out);
+}
+
+int decode_rgb(Encoder& en) {
+ return en.decompress();
+}
+
+// EXE transform: <encoded-size> <begin> <block>...
+// Encoded-size is 4 bytes, MSB first.
+// begin is the offset of the start of the input file, 4 bytes, MSB first.
+// Each block applies the e8e9 transform to strings falling entirely
+// within the block starting from the end and working backwards.
+// The 5 byte pattern is E8/E9 xx xx xx 00/FF (x86 CALL/JMP xxxxxxxx)
+// where xxxxxxxx is a relative address LSB first. The address is
+// converted to an absolute address by adding the offset mod 2^25
+// (in range +-2^24).
+
+void encode_exe(FILE* in, FILE* out, int len, int begin) {
+ const int BLOCK=0x10000;
+ Array<U8> blk(BLOCK);
+ fprintf(out, "%c%c%c%c", len>>24, len>>16, len>>8, len); // size, MSB first
+ fprintf(out, "%c%c%c%c", begin>>24, begin>>16, begin>>8, begin);
+
+ // Transform
+ for (int offset=0; offset<len; offset+=BLOCK) {
+ int size=min(len-offset, BLOCK);
+ int bytesRead=fread(&blk[0], 1, size, in);
+ if (bytesRead!=size) quit("encode_exe read error");
+ for (int i=bytesRead-1; i>=4; --i) {
+ if ((blk[i-4]==0xe8||blk[i-4]==0xe9) && (blk[i]==0||blk[i]==0xff)) {
+ int a=(blk[i-3]|blk[i-2]<<8|blk[i-1]<<16|blk[i]<<24)+offset+begin+i+1;
+ a<<=7;
+ a>>=7;
+ blk[i]=a>>24;
+ blk[i-1]=a>>16;
+ blk[i-2]=a>>8;
+ blk[i-3]=a;
+ }
+ }
+ fwrite(&blk[0], 1, bytesRead, out);
+ }
+}
+
+int decode_exe(Encoder& en) {
+ const int BLOCK=0x10000; // block size
+ static int offset=0, q=0; // decode state: file offset, queue size
+ static int size=0; // where to stop coding
+ static int begin=0; // offset in file
+ static U8 c[5]; // queue of last 5 bytes, c[0] at front
+
+ // Read size from first 4 bytes, MSB first
+ while (offset==size && q==0) {
+ offset=0;
+ size=en.decompress()<<24;
+ size|=en.decompress()<<16;
+ size|=en.decompress()<<8;
+ size|=en.decompress();
+ begin=en.decompress()<<24;
+ begin|=en.decompress()<<16;
+ begin|=en.decompress()<<8;
+ begin|=en.decompress();
+ }
+
+ // Fill queue
+ while (offset<size && q<5) {
+ memmove(c+1, c, 4);
+ c[0]=en.decompress();
+ ++q;
+ ++offset;
+ }
+
+ // E8E9 transform: E8/E9 xx xx xx 00/FF -> subtract location from x
+ if (q==5 && (c[4]==0xe8||c[4]==0xe9) && (c[0]==0||c[0]==0xff)
+ && ((offset-1^offset-5)&-BLOCK)==0) { // not crossing block boundary
+ int a=(c[3]|c[2]<<8|c[1]<<16|c[0]<<24)-offset-begin;
+ a<<=7;
+ a>>=7;
+ c[3]=a;
+ c[2]=a>>8;
+ c[1]=a>>16;
+ c[0]=a>>24;
+ }
+
+ // return oldest byte in queue
+ assert(q>0 && q<=5);
+ return c[--q];
+}
+
+
+
+// Split n bytes into blocks by type. For each block, output
+// <type> <size> and call encode_X to convert to type X.
+void encode(FILE* in, FILE* out, int n) {
+ Filetype type=DEFAULT;
+ long begin=ftell(in);
+ while (n>0) {
+ Filetype nextType=detect(in, n, type);
+ long end=ftell(in);
+ fseek(in, begin, SEEK_SET);
+ int len=int(end-begin);
+ if (len>0) {
+ fprintf(out, "%c%c%c%c%c", type, len>>24, len>>16, len>>8, len);
+ switch(type) {
+ case JPEG: encode_jpeg(in, out, len); break;
+ case BMPFILE4:
+ case BMPFILE8:
+ case BMPFILE24:
+ encode_bmp(in, out, len); break;
+ case PGMFILE: encode_pgm(in, out, len); break;
+ case RGBFILE: encode_rgb(in, out, len); break;
+ case EXE: encode_exe(in, out, len, begin); break;
+ default: encode_default(in, out, len); break;
+ }
+ }
+ n-=len;
+ type=nextType;
+ begin=end;
+ }
+}
+
+// Decode <type> <len> <data>...
+int decode(Encoder& en) {
+ static Filetype type=DEFAULT;
+ static int len=0;
+ while (len==0) {
+ type=(Filetype)en.decompress();
+ len=en.decompress()<<24;
+ len|=en.decompress()<<16;
+ len|=en.decompress()<<8;
+ len|=en.decompress();
+ if (len<0) len=1;
+ }
+ --len;
+ switch (type) {
+ case JPEG: return decode_jpeg(en);
+ case BMPFILE4:
+ case BMPFILE8:
+ case BMPFILE24:
+ return decode_bmp(en);
+ case PGMFILE: return decode_pgm(en);
+ case RGBFILE: return decode_rgb(en);
+ case EXE: return decode_exe(en);
+ default: return decode_default(en);
+ }
+}
+
+//////////////////// Compress, Decompress ////////////////////////////
+
+// Print progress: n is the number of bytes compressed or decompressed
+void printStatus(int n) {
+ if (n>0 && !(n&0x0fff))
+ printf("%12d\b\b\b\b\b\b\b\b\b\b\b\b", n), fflush(stdout);
+}
+
+// Compress a file
+void compress(const char* filename, long filesize, Encoder& en) {
+ assert(en.getMode()==COMPRESS);
+ assert(filename && filename[0]);
+ FILE *f=fopen(filename, "rb");
+ if (!f) perror(filename), quit();
+ long start=en.size();
+ printf("%s %ld -> ", filename, filesize);
+
+ // Transform and test in blocks
+ const int BLOCK=MEM*64;
+ for (int i=0; filesize>0; i+=BLOCK) {
+ int size=BLOCK;
+ if (size>filesize) size=filesize;
+ FILE* tmp=tmpfile();
+ if (!tmp) perror("tmpfile"), quit();
+ long savepos=ftell(f);
+ encode(f, tmp, size);
+
+ // Test transform
+ rewind(tmp);
+ en.setFile(tmp);
+ fseek(f, savepos, SEEK_SET);
+ long j;
+ int c1=0, c2=0;
+ for (j=0; j<size; ++j)
+ if ((c1=decode(en))!=(c2=getc(f))) break;
+
+ // Test fails, compress without transform
+ if (j!=size || getc(tmp)!=EOF) {
+ printf("Transform fails at %ld, input=%d decoded=%d, skipping...\n", i+j, c2, c1);
+ en.compress(0);
+ en.compress(size>>24);
+ en.compress(size>>16);
+ en.compress(size>>8);
+ en.compress(size);
+ fseek(f, savepos, SEEK_SET);
+ for (int j=0; j<size; ++j) {
+ printStatus(i+j);
+ en.compress(getc(f));
+ }
+ }
+
+ // Test succeeds, decode(encode(f)) == f, compress tmp
+ else {
+ rewind(tmp);
+ int c;
+ j=0;
+ while ((c=getc(tmp))!=EOF) {
+ printStatus(i+j++);
+ en.compress(c);
+ }
+ }
+ filesize-=size;
+ fclose(tmp); // deletes
+ }
+ if (f) fclose(f);
+ printf("%-12ld\n", en.size()-start);
+}
+
+// Try to make a directory, return true if successful
+bool makedir(const char* dir) {
+#ifdef WINDOWS
+ return CreateDirectory(dir, 0)==TRUE;
+#else
+#ifdef UNIX
+ return mkdir(dir, 0777)==0;
+#else
+ return false;
+#endif
+#endif
+}
+
+// Decompress a file
+void decompress(const char* filename, long filesize, Encoder& en) {
+ assert(en.getMode()==DECOMPRESS);
+ assert(filename && filename[0]);
+
+ // Test if output file exists. If so, then compare.
+ FILE* f=fopen(filename, "rb");
+ if (f) {
+ printf("Comparing %s %ld -> ", filename, filesize);
+ bool found=false; // mismatch?
+ for (int i=0; i<filesize; ++i) {
+ printStatus(i);
+ int c1=found?EOF:getc(f);
+ int c2=decode(en);
+ if (c1!=c2 && !found) {
+ printf("differ at %d: file=%d archive=%d\n", i, c1, c2);
+ found=true;
+ }
+ }
+ if (!found && getc(f)!=EOF)
+ printf("file is longer\n");
+ else if (!found)
+ printf("identical \n");
+ fclose(f);
+ }
+
+ // Create file
+ else {
+ f=fopen(filename, "wb");
+ if (!f) { // Try creating directories in path and try again
+ String path(filename);
+ for (int i=0; path[i]; ++i) {
+ if (path[i]=='/' || path[i]=='\\') {
+ char savechar=path[i];
+ path[i]=0;
+ if (makedir(path.c_str()))
+ printf("Created directory %s\n", path.c_str());
+ path[i]=savechar;
+ }
+ }
+ f=fopen(filename, "wb");
+ }
+
+ // Decompress
+ if (f) {
+ printf("Extracting %s %ld -> ", filename, filesize);
+ for (int i=0; i<filesize; ++i) {
+ printStatus(i);
+ putc(decode(en), f);
+ }
+ fclose(f);
+ printf("done \n");
+ }
+
+ // Can't create, discard data
+ else {
+ perror(filename);
+ printf("Skipping %s %ld -> ", filename, filesize);
+ for (int i=0; i<filesize; ++i) {
+ printStatus(i);
+ decode(en);
+ }
+ printf("not extracted\n");
+ }
+ }
+}
+
+//////////////////////////// User Interface ////////////////////////////
+
+// Read one line, return NULL at EOF or ^Z. f may be opened ascii or binary.
+// Trailing \r\n is dropped. Line length is unlimited.
+
+const char* getline(FILE *f=stdin) {
+ static String s;
+ int len=0, c;
+ while ((c=getc(f))!=EOF && c!=26 && c!='\n') {
+ if (len>=s.size()) s.resize(len*2+1);
+ if (c!='\r') s[len++]=c;
+ }
+ if (len>=s.size()) s.resize(len+1);
+ s[len]=0;
+ if (c==EOF || c==26)
+ return 0;
+ else
+ return s.c_str();
+}
+
+// int expand(String& archive, String& s, const char* fname, int base) {
+// Given file name fname, print its length and base name (beginning
+// at fname+base) to archive in format "%ld\t%s\r\n" and append the
+// full name (including path) to String s in format "%s\n". If fname
+// is a directory then substitute all of its regular files and recursively
+// expand any subdirectories. Base initially points to the first
+// character after the last / in fname, but in subdirectories includes
+// the path from the topmost directory. Return the number of files
+// whose names are appended to s and archive.
+
+// Same as expand() except fname is an ordinary file
+int putsize(String& archive, String& s, const char* fname, int base) {
+ int result=0;
+ FILE *f=fopen(fname, "rb");
+ if (f) {
+ fseek(f, 0, SEEK_END);
+ long len=ftell(f);
+ if (len>=0) {
+ static char blk[24];
+ sprintf(blk, "%ld\t", len);
+ archive+=blk;
+ archive+=(fname+base);
+ archive+="\r\n";
+ s+=fname;
+ s+="\n";
+ ++result;
+ }
+ fclose(f);
+ }
+ return result;
+}
+
+#ifdef WINDOWS
+
+int expand(String& archive, String& s, const char* fname, int base) {
+ int result=0;
+ DWORD attr=GetFileAttributes(fname);
+ if ((attr != 0xFFFFFFFF) && (attr & FILE_ATTRIBUTE_DIRECTORY)) {
+ WIN32_FIND_DATA ffd;
+ String fdir(fname);
+ fdir+="/*";
+ HANDLE h=FindFirstFile(fdir.c_str(), &ffd);
+ while (h!=INVALID_HANDLE_VALUE) {
+ if (!equals(ffd.cFileName, ".") && !equals(ffd.cFileName, "..")) {
+ String d(fname);
+ d+="/";
+ d+=ffd.cFileName;
+ result+=expand(archive, s, d.c_str(), base);
+ }
+ if (FindNextFile(h, &ffd)!=TRUE) break;
+ }
+ FindClose(h);
+ }
+ else // ordinary file
+ result=putsize(archive, s, fname, base);
+ return result;
+}
+
+#else
+#ifdef UNIX
+
+int expand(String& archive, String& s, const char* fname, int base) {
+ int result=0;
+ struct stat sb;
+ if (stat(fname, &sb)<0) return 0;
+
+ // If a regular file and readable, get file size
+ if (sb.st_mode & S_IFREG && sb.st_mode & 0400)
+ result+=putsize(archive, s, fname, base);
+
+ // If a directory with read and execute permission, traverse it
+ else if (sb.st_mode & S_IFDIR && sb.st_mode & 0400 && sb.st_mode & 0100) {
+ DIR *dirp=opendir(fname);
+ if (!dirp) {
+ perror("opendir");
+ return result;
+ }
+ dirent *dp;
+ while(errno=0, (dp=readdir(dirp))!=0) {
+ if (!equals(dp->d_name, ".") && !equals(dp->d_name, "..")) {
+ String d(fname);
+ d+="/";
+ d+=dp->d_name;
+ result+=expand(archive, s, d.c_str(), base);
+ }
+ }
+ if (errno) perror("readdir");
+ closedir(dirp);
+ }
+ else printf("%s is not a readable file or directory\n", fname);
+ return result;
+}
+
+#else // Not WINDOWS or UNIX, ignore directories
+
+int expand(String& archive, String& s, const char* fname, int base) {
+ return putsize(archive, s, fname, base);
+}
+
+#endif
+#endif
+
+
+// To compress to file1.paq8p: paq8p [-n] file1 [file2...]
+// To decompress: paq8p file1.paq8p [output_dir]
+int paqmain(int argc, char** argv) {
+ bool pause=argc<=2; // Pause when done?
+ try {
+
+ // Get option
+ bool doExtract=false; // -d option
+ if (argc>1 && argv[1][0]=='-' && argv[1][1] && !argv[1][2]) {
+ if (argv[1][1]>='0' && argv[1][1]<='9')
+ level=argv[1][1]-'0';
+ else if (argv[1][1]=='d')
+ doExtract=true;
+ else
+ quit("Valid options are -0 through -9 or -d\n");
+ --argc;
+ ++argv;
+ pause=false;
+ }
+
+ // Print help message
+ if (argc<2) {
+ printf(PROGNAME " archiver (C) 2008, Matt Mahoney et al.\n"
+ "Free under GPL, http://www.gnu.org/licenses/gpl.txt\n\n"
+#ifdef WINDOWS
+ "To compress or extract, drop a file or folder on the "
+ PROGNAME " icon.\n"
+ "The output will be put in the same folder as the input.\n"
+ "\n"
+ "Or from a command window: "
+#endif
+ "To compress:\n"
+ " " PROGNAME " -level file (compresses to file." PROGNAME ")\n"
+ " " PROGNAME " -level archive files... (creates archive." PROGNAME ")\n"
+ " " PROGNAME " file (level -%d, pause when done)\n"
+ "level: -0 = store, -1 -2 -3 = faster (uses 35, 48, 59 MB)\n"
+ "-4 -5 -6 -7 -8 = smaller (uses 133, 233, 435, 837, 1643 MB)\n"
+#if defined(WINDOWS) || defined (UNIX)
+ "You may also compress directories.\n"
+#endif
+ "\n"
+ "To extract or compare:\n"
+ " " PROGNAME " -d dir1/archive." PROGNAME " (extract to dir1)\n"
+ " " PROGNAME " -d dir1/archive." PROGNAME " dir2 (extract to dir2)\n"
+ " " PROGNAME " archive." PROGNAME " (extract, pause when done)\n"
+ "\n"
+ "To view contents: more < archive." PROGNAME "\n"
+ "\n",
+ DEFAULT_OPTION);
+ quit();
+ }
+
+ FILE* archive=0; // compressed file
+ int files=0; // number of files to compress/decompress
+ Array<char*> fname(1); // file names (resized to files)
+ Array<long> fsize(1); // file lengths (resized to files)
+
+ // Compress or decompress? Get archive name
+ Mode mode=COMPRESS;
+ String archiveName(argv[1]);
+ {
+ const int prognamesize=strlen(PROGNAME);
+ const int arg1size=strlen(argv[1]);
+ if (arg1size>prognamesize+1 && argv[1][arg1size-prognamesize-1]=='.'
+ && equals(PROGNAME, argv[1]+arg1size-prognamesize)) {
+ mode=DECOMPRESS;
+ }
+ else if (doExtract)
+ mode=DECOMPRESS;
+ else {
+ archiveName+=".";
+ archiveName+=PROGNAME;
+ }
+ }
+
+ // Compress: write archive header, get file names and sizes
+ String filenames;
+ if (mode==COMPRESS) {
+
+ // Expand filenames to read later. Write their base names and sizes
+ // to archive.
+ String header_string;
+ int i;
+ for ( i=1; i<argc; ++i) {
+ String name(argv[i]);
+ int len=name.size()-1;
+ for (int j=0; j<=len; ++j) // change \ to /
+ if (name[j]=='\\') name[j]='/';
+ while (len>0 && name[len-1]=='/') // remove trailing /
+ name[--len]=0;
+ int base=len-1;
+ while (base>=0 && name[base]!='/') --base; // find last /
+ ++base;
+ if (base==0 && len>=2 && name[1]==':') base=2; // chop "C:"
+ int expanded=expand(header_string, filenames, name.c_str(), base);
+ if (!expanded && (i>1||argc==2))
+ printf("%s: not found, skipping...\n", name.c_str());
+ files+=expanded;
+ }
+
+ // If archive doesn't exist and there is at least one file to compress
+ // then create the archive header.
+ if (files<1) quit("Nothing to compress\n");
+// archive=fopen(archiveName.c_str(), "rb");
+// if (archive)
+// printf("%s already exists\n", archiveName.c_str()), quit();
+ archive=fopen(archiveName.c_str(), "wb+");
+ if (!archive) perror(archiveName.c_str()), quit();
+ fprintf(archive, PROGNAME " -%d\r\n%s\x1A",
+ level, header_string.c_str());
+ printf("Creating archive %s with %d file(s)...\n",
+ archiveName.c_str(), files);
+
+ // Fill fname[files], fsize[files] with input filenames and sizes
+ fname.resize(files);
+ fsize.resize(files);
+ char *p=&filenames[0];
+ rewind(archive);
+ getline(archive);
+ for ( i=0; i<files; ++i) {
+ const char *num=getline(archive);
+ assert(num);
+ fsize[i]=atol(num);
+ assert(fsize[i]>=0);
+ fname[i]=p;
+ while (*p!='\n') ++p;
+ assert(p-filenames.c_str()<filenames.size());
+ *p++=0;
+ }
+ fseek(archive, 0, SEEK_END);
+ }
+
+ // Decompress: open archive for reading and store file names and sizes
+ if (mode==DECOMPRESS) {
+ archive=fopen(archiveName.c_str(), "rb+");
+ if (!archive) perror(archiveName.c_str()), quit();
+
+ // Check for proper format and get option
+ const char* header=getline(archive);
+ if (strncmp(header, PROGNAME " -", strlen(PROGNAME)+2))
+ printf("%s: not a %s file\n", archiveName.c_str(), PROGNAME), quit();
+ level=header[strlen(PROGNAME)+2]-'0';
+ if (level<0||level>9) level=DEFAULT_OPTION;
+
+ // Fill fname[files], fsize[files] with output file names and sizes
+ while (getline(archive)) ++files; // count files
+ printf("Extracting %d file(s) from %s -%d\n", files,
+ archiveName.c_str(), level);
+ long header_size=ftell(archive);
+ filenames.resize(header_size+4); // copy of header
+ rewind(archive);
+ fread(&filenames[0], 1, header_size, archive);
+ fname.resize(files);
+ fsize.resize(files);
+ char* p=&filenames[0];
+ while (*p && *p!='\r') ++p; // skip first line
+ ++p;
+ for (int i=0; i<files; ++i) {
+ fsize[i]=atol(p+1);
+ while (*p && *p!='\t') ++p;
+ fname[i]=p+1;
+ while (*p && *p!='\r') ++p;
+ if (!*p) printf("%s: header corrupted at %d\n", archiveName.c_str(),
+ p-&filenames[0]), quit();
+ assert(p-&filenames[0]<header_size);
+ *p++=0;
+ }
+ }
+
+ // Set globals according to option
+ assert(level>=0 && level<=9);
+ buf.setsize(MEM*8);
+
+ // Compress or decompress files
+ assert(fname.size()==files);
+ assert(fsize.size()==files);
+ long total_size=0; // sum of file sizes
+ for (int i=0; i<files; ++i) total_size+=fsize[i];
+ Encoder en(mode, archive);
+ if (mode==COMPRESS) {
+ for (int i=0; i<files; ++i)
+ compress(fname[i], fsize[i], en);
+ en.flush();
+ printf("%ld -> %ld\n", total_size, en.size());
+ }
+
+ // Decompress files to dir2: paq8p -d dir1/archive.paq8p dir2
+ // If there is no dir2, then extract to dir1
+ // If there is no dir1, then extract to .
+ else {
+ assert(argc>=2);
+ String dir(argc>2?argv[2]:argv[1]);
+ if (argc==2) { // chop "/archive.paq8p"
+ int i;
+ for (i=dir.size()-2; i>=0; --i) {
+ if (dir[i]=='/' || dir[i]=='\\') {
+ dir[i]=0;
+ break;
+ }
+ if (i==1 && dir[i]==':') { // leave "C:"
+ dir[i+1]=0;
+ break;
+ }
+ }
+ if (i==-1) dir="."; // "/" not found
+ }
+ dir=dir.c_str();
+ if (dir[0] && (dir.size()!=3 || dir[1]!=':')) dir+="/";
+ for (int i=0; i<files; ++i) {
+ String out(dir.c_str());
+ out+=fname[i];
+ decompress(out.c_str(), fsize[i], en);
+ }
+ }
+ fclose(archive);
+ programChecker.print();
+ }
+ catch(const char* s) {
+ if (s) printf("%s\n", s);
+ }
+ if (pause) {
+ printf("\nClose this window or press ENTER to continue...\n");
+ getchar();
+ }
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+#ifndef LLVM
+ return paqmain(argc, argv);
+#else
+ int rc = 1;
+ /* there is lot of static data, need a clean state for decompress to work
+ * properly, so fork */
+ pid_t pid = fork();
+ if (pid == 0) {
+ /* compress files */
+ exit(paqmain(argc, argv));
+ } else if (pid == -1) {
+ perror("fork() failed");
+ exit(1);
+ }
+ wait(&rc);
+ if (rc)
+ return rc;
+ /* now decompress and verify */
+ char *deargv[3];
+ deargv[0] = argv[0];
+ deargv[1] = strdup("-d");
+
+ argc--;
+ argv++;
+ while(argc && argv[0][0] == '-') {argc--; argv++;}
+ String archiveName(argv[0]);
+ archiveName += "."PROGNAME;
+ deargv[2] = strdup(archiveName.c_str());
+ if (paqmain(3, deargv))
+ return 1;
+ free(deargv[1]);
+ unlink(deargv[2]);
+ free(deargv[2]);
+ return 0;
+#endif
+}
Added: test-suite/trunk/MultiSource/Benchmarks/PAQ8p/readme.txt
URL: http://llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/PAQ8p/readme.txt?rev=65042&view=auto
==============================================================================
--- test-suite/trunk/MultiSource/Benchmarks/PAQ8p/readme.txt (added)
+++ test-suite/trunk/MultiSource/Benchmarks/PAQ8p/readme.txt Thu Feb 19 06:02:53 2009
@@ -0,0 +1,40 @@
+paq8p archiver, released by Andreas Morphis(andreasmorphis at yahoo.com), Aug. 22, 2008.
+
+Copyright (C) 2008 Matt Mahoney, Serge Osnach, Alexander Ratushnyak,
+Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson,
+Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, KZ, Dark Shikari.
+
+ LICENSE
+
+ This program is free software; you can redistribute it and/or
+ modify it under the terms of the GNU General Public License as
+ published by the Free Software Foundation; either version 2 of
+ the License, or (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details at
+ Visit <http://www.gnu.org/copyleft/gpl.html>.
+
+See http://cs.fit.edu/~mmahoney/compression/ for the latest PAQ version.
+
+Contents:
+
+ readme.txt - This file.
+ paq8p.exe - Win32 executable (Intel compile for Pentium-MMX or higher)
+ paq8p.cpp - C++ source code.
+ paq7asm.asm - NASM source code for 32 bit x86 (Pentium-MMX or higher).
+ paq7asm-x86_64.asm - YASM source code for x86-64.
+ paq7asmsse.asm - NASM/YASM for Pentium 4 (SSE2) or higher in 32 bit mode.
+
+Added Jan. 9, 2009:
+
+ paq7asmsse2.asm - above, but optimized by Dark Shikari.
+ paq8p_sse2.exe - 32 bit Windows g++ compile with above for Pentium 4+.
+
+paq8p
+added wav Model
+minor improvements to bmp Model
+
+See paq8p.cpp source comments for usage and compiling instructions.
More information about the llvm-commits
mailing list