Merge pull request #12 from KrisKennaway/fix-encoder

Modernize and fix some image quality bugs
KrisKennaway · Jan 28, 2023 · ee66fa0 · ee66fa0
2 parents 4f6d7a7 + a925a89
commit ee66fa0
Show file tree

Hide file tree

Showing 12 changed files with 267 additions and 173 deletions.
diff --git a/README.md b/README.md
@@ -1,14 +1,14 @@
-# ]\[-Vision v0.2
+# ]\[-Vision v0.3
 
 Streaming video and audio for the Apple II.
 
 ]\[-Vision transcodes video files in standard formats (.mp4 etc) into a custom format optimized for streaming playback on
 Apple II hardware.
 
 Requires:
-- 64K 6502 Apple II machine (only tested on //gs so far, but should work on older systems)
+- 64K 6502 Apple II machine (tested on //gs and //e but should also work on ]\[/]\[+)
 - [Uthernet II](http://a2retrosystems.com/products.htm) ethernet card
-  - AFAIK no emulators support this hardware so you'll need to run it on a real machine to see it in action
+  - AppleWin ([Windows](https://github.com/AppleWin/AppleWin) and [Linux](https://github.com/audetto/AppleWin)) and [Ample](https://github.com/ksherlock/ample) (Mac) emulate the Uthernet II.  ]\[-Vision has been confirmed to work with Ample.
 
 Dedicated to the memory of [Bob Bishop](https://www.kansasfest.org/2014/11/remembering-bob-bishop/), early pioneer of Apple II
 [video](https://www.youtube.com/watch?v=RiWE-aO-cyU) and [audio](http://www.faddensoftware.com/appletalker.png).
@@ -17,6 +17,8 @@ Dedicated to the memory of [Bob Bishop](https://www.kansasfest.org/2014/11/remem
 
 Sample videos (recording of playback on Apple //gs with RGB monitor, or HDMI via VidHD)
 
+TODO: These are from older versions, for which quality was not as good.
+
 Double Hi-Res:
 - [Try getting this song out of your head](https://youtu.be/S7aNcyojoZI)
 - [Babylon 5 title credits](https://youtu.be/PadKk8n1xY8)
@@ -28,8 +30,6 @@ Older Hi-Res videos:
 - [Paranoimia ft Max Headroom](https://youtu.be/wfdbEyP6v4o)
 - [How many of us still feel about our Apple II's](https://youtu.be/-e5LRcnQF-A)
 
-(These are from older versions, for which quality was not as good)
-
 There may be more on this [YouTube playlist](https://www.youtube.com/playlist?list=PLoAt3SC_duBiIjqK8FBoDG_31nUPB8KBM)
 
 ## Details
@@ -40,7 +40,7 @@ This ends up streaming data at about 100KB/sec of which 56KB/sec are updates to
 
 The video frames are actually encoded at the original frame rate (or optionally by skipping frames), prioritizing differences in the screen content, so the effective frame rate is higher than this if only a fraction of the screen is changing between frames (which is the typical case). 
 
-I'm using the excellent (though under-documented ;) [BMP2DHR](http://www.appleoldies.ca/bmp2dhr/) to encode the input video stream into a sequence of memory maps, then post-processing the frame deltas to prioritize the screen bytes to stream in order to approximate these deltas as closely as possible within the timing budget. 
+I'm using the excellent (though under-documented ;) [BMP2DHR](https://github.com/digarok/b2d) to encode the input video stream into a sequence of memory maps, then post-processing the frame deltas to prioritize the screen bytes to stream in order to approximate these deltas as closely as possible within the timing budget. 
 
 ### KansasFest 2019 presentation
 
@@ -50,27 +50,35 @@ TODO: link video once it is available.
 
 ## Installation
 
-This currently requires python3.7 because some dependencies (e.g. weighted-levenshtein) don't compile with 3.9+, and 3.8
-has a [bug](https://bugs.python.org/issue44439) in object pickling.  
+This currently requires python3.8 because some dependencies (e.g. weighted-levenshtein) don't compile with 3.9+.
 
 ```
-python3.7 -m venv venv
+python3.8 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
 ```
 
-To generate the data files required by the transcoder:
+Before you can run the transcoder you need to generate the data files it requires:
 
 ```
 % python transcoder/make_data_tables.py
 ```
 
-This takes about 3 hours on my machine.
+This is a one-time setup.  It takes about 90 minutes on my machine.
+
+## Sample videos
 
-TODO: download instructions
+Some sample videos are available [here](https://www.dropbox.com/sh/kq2ej63smrzruwk/AADZSaqbNuTwAfnPWT6r9TJra?dl=0) for
+streaming (see `server/server.py`)
 
 ## Release Notes
 
+### v0.3 (17 Jan 2023)
+
+- Fixed an image quality bug in the transcoder
+- Documentation/quality of life improvements to installation process
+- Stop using LFS to store the generated data files in git, they're using up all my quota
+
 ### v0.2 (19 July 2019)
 
 #### Transcoder

diff --git a/requirements.txt b/requirements.txt
@@ -1,12 +1,30 @@
+appdirs==1.4.4
 audioread==3.0.0
+certifi==2022.12.7
+cffi==1.15.1
+charset-normalizer==3.0.1
 colormath==3.0.0
+decorator==5.1.1
 etaprogress==1.1.1
+idna==3.4
+importlib-metadata==6.0.0
+joblib==1.2.0
 librosa==0.9.2
-networkx==2.6.3
-numpy==1.21.6
+llvmlite==0.39.1
+networkx==3.0
+numba==0.56.4
+numpy==1.22.4  # Until colormath supports 1.23+
+packaging==23.0
 Pillow==9.4.0
-scikit-learn==1.0.2
+pooch==1.6.0
+pycparser==2.21
+requests==2.28.2
+resampy==0.4.2
+scikit-learn==1.2.0
 scikit-video==1.1.11
-scipy==1.7.3
+scipy==1.10.0
 soundfile==0.11.0
+threadpoolctl==3.1.0
+urllib3==1.26.14
 weighted-levenshtein==0.2.1
+zipp==3.11.0
diff --git a/transcoder/audio.py b/transcoder/audio.py
@@ -55,17 +55,17 @@ def _decode(self, f, buf) -> np.array:
             'float32').reshape((f.channels, -1), order='F')
 
         a = librosa.core.to_mono(data)
-        a = librosa.resample(a, f.samplerate,
-                             self.sample_rate,
+        a = librosa.resample(a, orig_sr=f.samplerate,
+                             target_sr=self.sample_rate,
                              res_type='scipy', scale=True).flatten()
 
         return a
 
     def _normalization(self, read_bytes=1024 * 1024 * 10):
         """Read first read_bytes of audio stream and compute normalization.
 
-        We compute the 2.5th and 97.5th percentiles i.e. only 2.5% of samples
-        will clip.
+        We normalize based on the 0.5th and 99.5th percentiles, i.e. only <1% of
+        samples will clip.
 
         :param read_bytes:
         :return:
@@ -77,7 +77,7 @@ def _normalization(self, read_bytes=1024 * 1024 * 10):
                 if len(raw) > read_bytes:
                     break
         a = self._decode(f, raw)
-        norm = np.max(np.abs(np.percentile(a, [2.5, 97.5])))
+        norm = np.max(np.abs(np.percentile(a, [0.5, 99.5])))
 
         return 16384. / norm
 

diff --git a/transcoder/data/.gitattributes b/transcoder/data/.gitattributes
diff --git a/transcoder/data/DHGR_palette_0_edit_distance.pickle.bz2 b/transcoder/data/DHGR_palette_0_edit_distance.pickle.bz2
diff --git a/transcoder/data/DHGR_palette_5_edit_distance.pickle.bz2 b/transcoder/data/DHGR_palette_5_edit_distance.pickle.bz2
diff --git a/transcoder/data/HGR_palette_0_edit_distance.pickle.bz2 b/transcoder/data/HGR_palette_0_edit_distance.pickle.bz2
diff --git a/transcoder/data/HGR_palette_5_edit_distance.pickle.bz2 b/transcoder/data/HGR_palette_5_edit_distance.pickle.bz2
diff --git a/transcoder/make_data_tables.py b/transcoder/make_data_tables.py
@@ -1,6 +1,5 @@
-import bz2
 import functools
-import pickle
+import os
 import sys
 from typing import Iterable, Type
 
@@ -17,7 +16,7 @@
 
 
 PIXEL_CHARS = "0123456789ABCDEF"
-
+DATA_DIR = "transcoder/data"
 
 def pixel_char(i: int) -> str:
     return PIXEL_CHARS[i]
@@ -39,7 +38,7 @@ class EditDistanceParams:
     # Smallest substitution value is ~20 from palette.diff_matrices, i.e.
     # we always prefer to transpose 2 pixels rather than substituting colours.
     # TODO: is quality really better allowing transposes?
-    transpose_costs = np.ones((128, 128), dtype=np.float64) * 100000  # 10
+    transpose_costs = np.ones((128, 128), dtype=np.float64)
 
     # These will be filled in later
     substitute_costs = np.zeros((128, 128), dtype=np.float64)
@@ -113,7 +112,7 @@ def compute_edit_distance(
         edp: EditDistanceParams,
         bitmap_cls: Type[screen.Bitmap],
         nominal_colours: Type[colours.NominalColours]
-):
+) -> np.ndarray:
     """Computes edit distance matrix between all pairs of pixel strings.
 
     Enumerates all possible values of the masked bit representation from
@@ -131,44 +130,45 @@ def compute_edit_distance(
 
     bitrange = np.uint64(2 ** bits)
 
-    edit = []
-    for _ in range(len(bitmap_cls.BYTE_MASKS)):
-        edit.append(
-            np.zeros(shape=np.uint64(bitrange * bitrange), dtype=np.uint16))
+    edit = np.zeros(
+        shape=(len(bitmap_cls.BYTE_MASKS), np.uint64(bitrange * bitrange)),
+        dtype=np.uint16)
 
-    # Matrix is symmetrical with zero diagonal so only need to compute upper
-    # triangle
-    bar = ProgressBar((bitrange * (bitrange - 1)) / 2, max_width=80)
+    bar = ProgressBar(
+        bitrange * (bitrange - 1) / 2 * len(bitmap_cls.PHASES), max_width=80)
 
     num_dots = bitmap_cls.MASKED_DOTS
 
     cnt = 0
     for i in range(np.uint64(bitrange)):
-        for j in range(i):
-            cnt += 1
-
-            if cnt % 10000 == 0:
-                bar.numerator = cnt
-                print(bar, end='\r')
-                sys.stdout.flush()
+        pair_base = np.uint64(i) << bits
+        for o, ph in enumerate(bitmap_cls.PHASES):
+            # Compute this in the outer loop since it's invariant under j
+            first_dots = bitmap_cls.to_dots(i, byte_offset=o)
+            first_pixels = pixel_string(
+                colours.dots_to_nominal_colour_pixel_values(
+                    num_dots, first_dots, nominal_colours,
+                    init_phase=ph)
+            )
+
+            # Matrix is symmetrical with zero diagonal so only need to compute
+            # upper triangle
+            for j in range(i):
+                cnt += 1
+                if cnt % 100000 == 0:
+                    bar.numerator = cnt
+                    print(bar, end='\r')
+                    sys.stdout.flush()
+
+                pair = pair_base + np.uint64(j)
 
-            pair = (np.uint64(i) << bits) + np.uint64(j)
-
-            for o, ph in enumerate(bitmap_cls.PHASES):
-                first_dots = bitmap_cls.to_dots(i, byte_offset=o)
                 second_dots = bitmap_cls.to_dots(j, byte_offset=o)
-
-                first_pixels = pixel_string(
-                    colours.dots_to_nominal_colour_pixel_values(
-                        num_dots, first_dots, nominal_colours,
-                        init_phase=ph)
-                )
                 second_pixels = pixel_string(
                     colours.dots_to_nominal_colour_pixel_values(
                         num_dots, second_dots, nominal_colours,
                         init_phase=ph)
                 )
-                edit[o][pair] = edit_distance(
+                edit[o, pair] = edit_distance(
                     edp, first_pixels, second_pixels, error=False)
 
     return edit
@@ -183,13 +183,17 @@ def make_edit_distance(
     """Write file containing (D)HGR edit distance matrix for a palette."""
 
     dist = compute_edit_distance(edp, bitmap_cls, nominal_colours)
-    data = "transcoder/data/%s_palette_%d_edit_distance.pickle.bz2" % (
-        bitmap_cls.NAME, pal.ID.value)
-    with bz2.open(data, "wb", compresslevel=9) as out:
-        pickle.dump(dist, out, protocol=pickle.HIGHEST_PROTOCOL)
+    data = "%s/%s_palette_%d_edit_distance.npz" % (
+        DATA_DIR, bitmap_cls.NAME, pal.ID.value)
+    np.savez_compressed(data, edit_distance=dist)
 
 
 def main():
+    try:
+        os.mkdir(DATA_DIR, mode=0o755)
+    except FileExistsError:
+        pass
+
     for p in palette.PALETTES.values():
         print("Processing palette %s" % p)
         edp = compute_substitute_costs(p)

diff --git a/transcoder/movie.py b/transcoder/movie.py
@@ -6,6 +6,7 @@
 import frame_grabber
 import machine
 import opcodes
+import screen
 import video
 from palette import Palette
 from video_mode import VideoMode
@@ -58,34 +59,54 @@ def encode(self) -> Iterator[opcodes.Opcode]:
         :return:
         """
         video_frames = self.frame_grabber.frames()
-        main_seq = None
-        aux_seq = None
+        op_seq = None
 
         yield opcodes.Header(mode=self.video_mode)
 
+        last_memory_bank = self.aux_memory_bank
         for au in self.audio.audio_stream():
             self.ticks += 1
-            if self.video.tick(self.ticks):
+            new_video_frame = self.video.tick(self.ticks)
+            if new_video_frame:
                 try:
                     main, aux = next(video_frames)
                 except StopIteration:
                     break
 
-                if ((self.video.frame_number - 1) % self.every_n_video_frames
-                        == 0):
+                should_encode_frame = (
+                        (self.video.frame_number - 1) %
+                        self.every_n_video_frames == 0
+                )
+                if should_encode_frame:
+                    if self.video_mode == VideoMode.DHGR:
+                        target_pixelmap = screen.DHGRBitmap(
+                            main_memory=main,
+                            aux_memory=aux,
+                            palette=self.palette
+                        )
+                    else:
+                        target_pixelmap = screen.HGRBitmap(
+                            main_memory=main,
+                            palette=self.palette
+                        )
+
                     print("Starting frame %d" % self.video.frame_number)
-                    main_seq = self.video.encode_frame(main, is_aux=False)
+                    op_seq = self.video.encode_frame(
+                        target_pixelmap, is_aux=self.aux_memory_bank)
+                    self.video.out_of_work = {True: False, False: False}
 
-                    if aux:
-                        aux_seq = self.video.encode_frame(aux, is_aux=True)
+            if self.aux_memory_bank != last_memory_bank:
+                # We've flipped memory banks, start new opcode sequence
+                last_memory_bank = self.aux_memory_bank
+                op_seq = self.video.encode_frame(
+                    target_pixelmap, is_aux=self.aux_memory_bank)
 
             # au has range -15 .. 16 (step=1)
             # Tick cycles are units of 2
             tick = au * 2  # -30 .. 32 (step=2)
             tick += 34  # 4 .. 66 (step=2)
 
-            (page, content, offsets) = next(
-                aux_seq if self.aux_memory_bank else main_seq)
+            (page, content, offsets) = next(op_seq)
 
             yield opcodes.TICK_OPCODES[(tick, page)](content, offsets)