Saturday, November 12, 2016

Playing With The Lytro Illum 1: What is a Light Field?

Depth Map
Depth Map
(Normalized to exaggerate depth)
The Lytro Illum is a light field camera. They have an online gallery where you can play with some of their favorite sample images (I don't think any of mine are there in their favorites.). The basic gimmick is that you can refocus the image after you take the photo. Actually, it goes a little further than that. In theory, given a light field, you can actually also change the aperture, slightly move the virtual camera(showing a perspective shift), and even simulate a tilt-shift lens.

This sounds like it would make things easier if I can be sloppy and don't have to worry about focus or aperture when I'm taking the photo, and wait until I edit the photo to decide. But it's a bit more complicated than that. First, I'll discuss what it means for something to be "in focus" in a photo, and lead into what a "light field" is and how I think refocusing works.

First, a disclaimer: I don't know the real tech specs of the Lytro Illum beyond what they share on their website. I've only read their manual and watched a bunch of their tutorials on vimeo. I haven't read the Ren Ng 2006 Stanford dissertation. I've looked over the 1996 Levoy/Hanrahan(Stanford) SIGGRAPH 1996 Paper (pdf). (I think Hanrahan was Ng's advisor) I might oversimplify something, or get a "16" cross-wired here or there... but I'll try to cite my sources.

What does it mean to be in-focus?

Before getting into the Illum, it may be useful to discuss what focus really means.

Figure 1: Construction lines showing the blue point would
focus on the sensor and the red point would focus behind
the sensor.
Camera lenses (especially zoom lenses) are more complicated than this, but for the purpose of getting the basic ideas across, the thin lens approximation will do. (You can read more about the thin lens on Wikipedia, which itself just refers to the Hecht Optics text. I think really most college physics texts cover this.)

Consider two points at different distances from the camera: a blue point and a red point. Approximate the camera with just a lens and an image sensor. In Figure 1, the lens is set to focus on the (farther) blue point.

Figure 2: The blue dot will show up in-focus on the image sensor.The
red dot will be defocused. The wider aperture will be more defocused.
In Figure 2, the lens focuses all light rays coming from the blue point to the same point on the image plane so it is in focus. The red point is at a different depth and is focused to a point behind the image sensor, so at the plane of the image sensor, the light rays are spread out over a wide area.
Also, we can see that a narrower aperture lets fewer light rays in and lands on a narrower section of the image plane. It still is not completely in focus, but it is less defocused than a wide aperture(it has a smaller "circle of confusion").

To the right of the image sensor, I approximated what you might see with a narrow and a wide aperture. In the wide aperture, the light from the red dot is spread so wide that it is barely even visible, but the blue dot is still sharp.
Figure 3: Move lens forward to focus on red dot. The blue dot would
focus in front of the sensor so it would be defocused on the sensor.

To focus on the red dot, the lens is moved forward, away from the image sensor(Figure 3). This makes the red dot in-focus on the image sensor. (And the blue dot would be out of focus.)

So far, I've been describing everything as objects outside the camera projecting light onto the image sensor.

Figure 4: Consider a point on the image sensor and gather the rays
from the full aperture.
Another way to look at this is from the point of view of pixels on the image sensor looking out through the lens.

In Figure 4, consider another point on the image sensor. In a normal camera, the light recorded at this pixel is the total of all light through the aperture, shown as the red shaded region (this is the wide aperture).

These rays continue out into the world until they hit something. Only a very very small portion of these will hit our old red dot, so the red dot only makes a small contribution to the value at this pixel. If there was something at the convergence point, that would make up most of the value at this pixel.

There could be many objects in the world that are contributing to the value at this pixel. I first presented the idea of being in-focus as the light coming from different directions from a point in the world all coming together at a point on the image sensor and that blur/defocus came from the light from an object at a different depth being spread out across a larger area of the sensor. Now, I suggest another way to think of it - starting from the image sensor. If all the light arriving at a pixel is coming from the same source, the point will be in-focus. If the light is getting tiny contributions from lots of different objects, it will be out-of-focus.

When viewing a photograph from a normal camera, every pixel contains the contributions of light passing through the whole aperture. But you cannot break down what all the objects were(or how far away they were) that contributed to the value in that pixel.

The Light Field and Refocusing

Figure 5: Lytro uses a microlens array to record a section
of rays instead of the full aperture.
This is what is different/magical about the Lytro cameras. The basic idea is that instead of every pixel saving the total contribution through the aperture, it saves the individual directions and contributions of the rays(sort of).

Lytro adds a microlens array in front of the image sensor (Figure 5). At every pixel, there are actually several microlenses gathering light from different directions. (This is just to illustrate the idea. I don't know if the sections overlap, or if it tries to divide the rays evenly or bias to the center...)

Figure 6: Trace rays that are gathered at the image sensor.
Now, we have the tools to think about how refocusing works. Suppose we took a photo with the lens focused on the blue dot, but later, we want to refocus the image on the red dot.

First, pretend that the lens was focused on the red dot(Figure 6). From the point of the image sensor, gather the rays that land at our pixel from the virtual lens at the red position(shown as the three dashed red lines). To the left of the red-focused lens, there are the rays coming into the camera that we're interested in.

Figure 7: Trace rays back through the actual lens,
going back through light field.
But the lens was really focused on the blue dot. Continue the rays back through the real lens position(Figure 7). These three rays trace back to three different points on the image sensor(and the direction is important). The solid red lines in Figure 7 are equivalent to the dashed red lines in Figure 6 (still shown faintly in Figure 7).

The blue rays are the rays that were actually recorded when we took the photo. (If you're following along in the Levoy/Hanrahan paper, I'm only at Section 2 - visualizing the plenoptic function / light slab.)

To simulate the lens focused on the red dot, then for our pixel of interest, we use the rays from the solid red lines. That is the basic idea behind refocusing.

We also can see that it doesn't always work. In Figure 7, there is a blue ray that's pretty close to the top red ray, but as we go further down, the available (blue) ray samples aren't quite lined up with the red rays. For these, we have to find nearby rays and interpolate, so the result will not be as accurate.

Also, once we have recorded the light field, we don't need to know anything about the scene in front of the camera.

The Lytro Implementation of the Light Field

Lytro adds an array of micro lenses as a layer between the lens and the image sensor (back in Figure 5).  This is all super cool, but it has its limitations.

We can't have an infinite number of ray directions. In practical terms, the more samples we acquire, the more memory and disk storage space we will need.

According to the Lytro Illum Technical Specifications, the Illum has a 40 MegaRay sensor. Their software produces a 2450x1634 (4 Megapixel) image from one of their files. So a quick division would say they average 10 rays per pixel in two dimensions. (My above diagrams have only been one dimension.)

10 rays/pixel is not a very large number. Their software actually creates an "image stack" of 7 images, so I think that's the number of planes you can really focus on. When I've looked at the individual images in the stack, it looks like they're dividing it up some other way though because I don't see much difference in what is in focus.

In the Lytro Illum User Manual or the Lytro Support description of "refocusable range", see the section on "Depth Composition Features / The refocusable range". If I'm interpreting their diagram and their article about sharpness correctly, it looks like there are really 2 real depths that you can refocus on with maximum sharpness - the near peak, and the far peak. The diagram seems to indicate you can refocus in a little wider range, but that it won't be as sharp. Something I thought was interesting was that in the sharpness article, they say the primary (+0) focus is actually a "low resolution point."

Resolution and Sensor Size

To put the technical specification into context, compare the resolution to my other cameras. I'm getting most of this from Wikipedia.: (For reference, I think the Galaxy s3 camera is comparable to an iPhone 5. * )
Camera Resolution Sensor Size
Feb 2006 Canon 30D 3504x2336 (8.2 Megapixels) CMOS 22.5x15 mm
Nov 2008 Canon 5D Mark 2 5616x3744 (21 Megapixels) CMOS 36x24 mm
May 2012 Samsung Galaxy s3
3264x2448 (8 Megapixels) CMOS 8.47 x ? mm
July 2014 Lytro Illum 2450x1634 (4 Megapixels,
40 MegaRays)
CMOS 6.4 x 4.8 mm

I'm a little disappointed in these numbers.

I like to have a little breathing space in the resolution for editing (cropping, straightening/rotating). Also, images are just sharper if I take a higher resolution image and resize it down. I was starting to feel the limitations(resolution, sensor size, ISO) of the 30D when I upgraded to the 5Dm2. Both the resolution and sensor size of the Illum are smaller than than my 30D, which is 8 years older!

I'm not a huge Megapixels guy, but I think the sensor size is important - and the Illum sensor is even smaller than my camera phone! Intuitively, I think a bigger sensor lets you have more space per pixel to gather light, and would have better performance in low light conditions - higher clean ISO.

40 MegaRays is a blessing and a curse. In a normal camera the number of samples is roughly the number of pixels (I'm going to ignore R,G,B, and Bayer Patterns for the purpose of this discussion.) In the Illum, the samples are actually the rays. As discussed above, this is what enables the refocusing capability, and for that, more samples (rays) are better. But you have to fit those on the sensor, and so they are squishing almost twice the samples of the 5Dm2 into a space that is less than 1/5 the linear dimensions (1/28 the area).

The Illum sensor is tiny - even smaller than my camera phone, and in practice, I did find that the low light performance was disappointing. The camera claims a top ISO of 3200, and I found even 1600 was noisy. Since the resolution was already pretty low, there wasn't a lot room to clean it up through resizing. I don't think the low light performance of their sensor justified the tiny sensor size. (Speculation) I think a noisy image comes back to bite me when their software generates the depth map, but that discussion will be a later entry.

A Noisy Image

Depth Map
Depth Map
(Normalized to exaggerate depth)
Here's a shot of the Atomic Cherry Bombs. The Illum has a native f-stop of f/2, which lets in a lot of light, so I thought I could get away with using ISO 1600 and shutter speed of 1/400 sec., which would let me really freeze the image. This was actually among the first photos I shot in the weekend, and they were actually already gathering backstage ready to go on when I arrived, so I didn't have time to run any tests first or set up a flash.

The camera advertises a top ISO of 3200. With Canon cameras,  a top  ISO of 3200 usually means that I can shoot at ISO 1600 and still be pretty clean. The above shot was shot at ISO 1600, but it's still got a lot of noise (most visibly, the magenta noise in the dark areas).

There are a few things to focus on:
  • faces/costumes
  • curtains and lights in the back
um... actually, I guess that's pretty much it. Looking at the real depth map, there's only really two shades. They are far enough apart to be able to focus on their faces and have the lights in back noticeably defocus, and visa versa. But that's about it. I wanted to be able to refocus on the foot in the front and transition back to the faces. But there is not a distinct shade for the foot.

There are serious problems in the depth map. Darker means closer and brighter means farther.
  • Look at the dancers themselves. The heads and bodies are approximately the same distance away. The faces and costumes register at a closer depth. Their skin should be at the same depth, but instead, it's the brighter tone, meaning it is registering at the same depth as the curtains behind them.
  • There is not a distinct depth/tone for the feet. The feet should be closer/darker in the depth map than anything else. But instead, the depth map says that they are the same depth as the curtains behind the dancers.
  • The edges of the depth map are splotchy and don't define the real depth edges of the objects in the scene.
(Speculation) I think that the fairly constant/even color of the skin and the noise from the high ISO is confusing the Lytro depth computations.

Depth Map
Going back to the "living picture" at the top of this entry, I think I just got lucky that the depth map recognized the two couples on different planes. Also, the two shades are sufficiently separated.

However, it has its share of problems too.
  • What is the blob on the floor underneath the dancers? Only part of the floor registered as close, but it should go all the way across the frame.
  • There is not a lot of separation between the rear couple and the back wall. Granted, they are close to the back wall. But ideally, we should see a smooth gradient on the floor going all the way back, but instead, there are two artificial blobs.
  • On the closer couple, the number on his back is the only thing that was identified as being on a separate plane.
  • The dark part of the depth map above his head makes no sense. Everything above his head is on the back wall, and should be brighter.

Parting Thoughts

There are a lot of cool ideas that went into the Lytro Illum camera and its ability to refocus. In this entry, I've tried to discuss how it works in terms of the technology and hardware, and what limitations I've experienced.

In future entries, I will discuss the user interface of the camera (probably will be a shorter one), and the processing software (probably will be a couple longer ones).

Other links: (Other people know way more about this than I do.)

* The iPhone 5 and the Galaxy s3 came out around the same time, and both use Sony BSI cameras. iPhone 5 has a Sony Exmor R IMX135. The teardown shows the Galaxy s3 has something in the Sony BSI family, but they say it is not the same as the IMX 145 or IMX 105, so I don't know what it is.

No comments:

Post a Comment