Deep Dive Into Image Scaling with Machine Learning Part 1
A long time ago back in 2003 I had the amazing idea to use nVidia Cg
on my GeForce 4 Ti4400 to accelerate image processing. I coined it imgFX at the time. While at the time I thought I was doing something no one else had, I quickly learned I was not and eventually shelved the project. Several years later in May 2008 I revived it with the HD Revolution rapidily approaching, renaming it to texelFX.
People had Standard Definition content and wanted to quickly release High Definition content cheaply. Using my Silicon Graphics Octane 2 (Dual R12k400mhz/V6 graphics) at the time I was writing a C++ OpenGL application to handle the scaling using the exclusive Silicon Graphics OpenGL extensions. This was working pretty well, although my scaling techniques were not much more advanced than a Nearest Neighbor scaler - I was struggling at the time with a Bicubic scaler (mostly due to my more mid-level programming abilities at the time). Results were sub-par mostly due to my programming abilities at the time. Fast forward to late summer 2017, I upgraded the GPU in my desktop to a GeForce 1080ti to take advantage of the numerous Cuda Libraries to accelerate floating point operations like I needed for image scaling. At that time I created the Github
repro, if I ever become unshamed of my 2008 deep-dive I will commit them to a separate repo.
The main reasonly for reviving the project was the news earlier in 2017 that Star Trek: Deep Space Nine would most likely never get a proper 1080p or better remastering
. While you could argue, popping the 2002-2003 DVD releases in a UHD upscale enabled blu-ray player might make it look the best it possibly could, I would argue those are not taking advantage of machine learning and simply applying noise reduction along with a bicubic scale.
Where I am today
Over the weekend I ported over my .NET Core 2.0 App I did back in August 2017 to a more split architecture:
-.NET Core 2.0 library (Contains all of the actual scaler code)
-ASP .NET Core 2.0 MVC App (Providing a quick interface to demonstrate the effectiveness of the algorithms)
-ASP .NET Core 2.0 WebAPI App (Providing a backend to support larger batch processes/mobile uploads/etc)
Along with the port, I got a Nearest Neighbor implementation done using the System.Drawing .NET Core 2.0 NuGet package
. This will serve as the baseline for which I will compare my approach. My approach will utilize the newly released Microsoft Cognitive Toolkit
to create a deep convolutional neural network to help with my image scaling solution. To dive in, take the following screencap from Season 6 Episode 1 "A Time to Stand":
Note the following:
-MPEG2 Compression Artifacts in the back left of the screencap where the 2 crew members are analyzing a screen and around the light
-Muted colors, granted DS9 was intentionally muted especially during the war seasons, but the color space of the DVD is vastly different from an HDR UHD of today
Scaling the image to HD (1920x1080):
Without doing a side by side comparison it is a little hard to see just how bad it is, so lets zoom in on the area mentioned above upscaled:
The issues mentioned above in the DVD screencap are only exacerbated by the scaling, making the quality even worse when viewed at the new upscaled resolution.
What I hope to Achieve
Given the issues above, my main goals:
-Provide a web interface and REST Service to scale single images or videos
-Remove compression artifacts (specifically MPEG2)
-Apply Machine Learning to provide detail to objects where there are not enough pixels (such as a Standard Definition source)
And with any luck provide myself a true High Definition of Deep Space Nine.
With my goals outlined, the first step is to deep dive into the https://docs.microsoft.com/en-us/cognitive-toolkit/
and begin training a model to provide goals 2 and 3 a viable solution.