Hacker News new | past | comments | ask | show | jobs | submit login
Behind the Tech with John Carmack: 5k Immersive Video (oculus.com)
395 points by ot on June 19, 2018 | hide | past | favorite | 178 comments



I dunno. Looking at canned content with VR goggles seems like a dead end. It's 3D TV on steroids. Remember 3D TV? Works fine. Too much trouble just to watch TV. Market failure.

It's been five years since the Oculus Rift DK1 appeared, and there's still no killer app. Game developers are pulling back from VR.[1][2] The VR virtual worlds are a disaster. Sansar has about 50 (fifty) concurrent users. SineSpace and High Fidelity have similar numbers.

VR goggles are cool for about an hour, and in the closet in a month.

The technology is coming along just fine, but that's not the problem.

[1] https://www.gq.com/story/is-vr-gaming-over-before-it-even-st... [2] https://mashable.com/2018/01/24/virtual-reality-gaming-loser...


I work in the entertainment industry for a very large company that also produces games. We've had the money, time, and interest to pursue VR. The truth is you're correct. Storytellers are having a really hard time figuring it out. One of the biggest challenges is headset fatigue, which limits the length of story. In conversations with one of the VR manufacturers, they admitted it's one of their biggest problems behind motion sickness.

About 14 million VR headsets have been sold worldwide. That puts it about on par with the NES in the late 80s. I choose to forgive the VR market for not immediately being as successful as the mature gaming market. A lot of things were being done in the early 90s that showed just how little we really understood digital entertainment. It'll take tech improvement and product experimentation combined to really get VR to come into its own. That'll take continued investment on every front. We won't get there by magic.


Just to humor my curiosity: According to the Wikipedia article, the NES sold 62 million units worldwide in total.


Yeah, you have to dig a little, because the NES wasn't discontinued until 1995. If you dig into wikipedia you can see quotes that put units sold in the range I quoted in the late 80s.


In 80s vs 90s?


I'm unsure how comparing sales of hardware vs a few years sales of a console 30 years ago is of any use whatsoever!


Because VR headsets have been selling for ~5 years.

Comparing them with the full lifecycle of the NES sales instead of its first 5 years is not a proper comparison.


I'd go further than that - VR headsets have only been properly selling for about two years at this point. The Oculus Rift didn't officially launch a consumer version until mid-2016, as did the HTC Vive, which launched around the same time. While Oculus headsets have been available in some form for five years, everything before the 2016 launch were prototype development-kit versions that came with a big asterisk on the order page saying they weren't meant for general consumer use.

Lots of enthusiasts did buy the Oculus DK1 and DK2 Rifts, but CV1 was the first time it was actually available in stores and marketed as being ready for consumer use.


Why compare them to NES sales at all?


If headset fatigue & motion sickness were non-issues, what would you make?


I could answer, but I don't know how important it is. At the pitch level, this question isn't really the issue. People have been dreaming of VR pitches for decades...partly why the possibility of real VR has captured the imagination of so many people.

At a more tactical level, such as beats, storyboarding, and even UX research...there's still a lot of experimentation when it comes to what works and what doesn't for storytellers. If you tell an experienced director to make an action movie, there's a lot of well understood techniques they can lean on to tell good story. So far, our experiences show that a lot of that flies out the window with VR unless you spend all your energy trying to encourage your user to frame the shot the same way a director wants you to. However, if you do that, you aren't really using VR anymore.

The problems that constrain story length are really a problem because you can only give story so much depth in 15 minutes...and if you don't have the opportunity to figure out how to tell a story with nuance, you won't learn how to get nuance right. VR has no nuance at the moment.


You're last sentence where you mention 'magic' made me laugh. Nice one, I agree, there will be no 'magic leap' forward so-to-speak, however things are progressing nicely.

The indie game scene on Steam VR is thriving and these experiments in vr experiences will yield future content and entertainment breakthroughs. The groundwork, at least, has finally begun.


Maybe for the average consumer market, but there are lots of industries who currently use simulation software and are slowly moving to VR and away from expensive multi screen (very permanent) setups.

Take "Verification of Competence" activities used to on-board and train people who use specific plant/machinery. If you want to work on a mine driving a large truck, you'll usually have to do a virtual test to show your prospective employer that you can efficiently pilot $500,000 worth of machinery.

Health and safety training in complex plants. Run simulations on an immersive virtual model, rather than sitting at a computer screen.

Train driver training. Drivers need to do a particular number of trips to learn the route (accelerating and braking for turns, stations etc) this can be in field or virtual but required to ensure safety.


I agree it's very unclear whether it'll ever mass consumer appeal but it definitely has current viability in more niche environments. If you call "business" a niche ;).

B2B++ B2C--


> Game developers are pulling back from VR

Some are, others are going all in and it's just getting good. Compare Lone Echo to launch experiences, it's a completely different experience. Not to mention Fallout VR and Skyrim VR launched this year.

> The VR virtual worlds are a disaster

Go on YouTube and search for VR Chat, check out the number of videos and view counts. VR Might not be mainstream yet and I don't believe it will be until stand alone headsets drop but it's naive to write it off just yet.


Oculus Go is here for 200$. I’ve been telling all my friends to get it. Got one for my parents.


Their marketing is failing, how come I didn't hear of it...

I want something like that, I bought the old Xiaomi headset (basically a Google Cardboard) and I can't run Netflix on it (which was what I wanted them for :( )

The Standalone Xiaomi one does seem to run Netflix though :)


> ...check out the number of videos and view counts...

Why do you think that is an indicator of success? VR has plenty of hype; marketing is not the issue. There are plenty of people who are interested in watching a video on YouTube that won't buy a headset.


>Why do you think that is an indicator of success?

Because this is what people in the video game industry are literally looking at as a gauge of success.

Just look how much they court youtubers and streamers. Games are literally built around them in 2018.

(Note, I don’t like that this is true but the sad fact is that it is true)


You can keep believing that, sure. I'll just be over here, continuing to earn my living delivering interactive 360 video and VR demonstrations for our clients :-)


You making a living off this doesn't invalidate the fact that VR is having a hard time catching on.


I'm a biased observer, sure. But I feel very confident in saying that VR is never ever going away, let alone already dead like OP was saying, or even remotely comparable to 3D TV.

It's not (as it stands) going to be adopted as the media-platform of choice for consuming mass amounts of netflix and marvel movies, but there's way more niches to make it a viable go-to solution for many problems.


Is the video still produced as ordinary 2d/planar video? It's just projected as 360? Or is there some kind of depth information? What makes it so compelling?


Most of our video is rendered down to ordinary video that gets reprojected, driven mostly by a requirement to be able to be played back on lowest-common-denominator hardware and platforms; nothing near as cool as what Carmack's article details. But there's other problems that require fun solutions to, too.


I’m curious: what company/industry are you with and what tech are you using specifically?


I work alongside civil/acoustic/hydraulic engineers. We use a whole bunch of tools for various productions, but I spend most of my time in 3DSMAX, Unreal Engine and Visual Studio.


Selling 3d TV to stores was briefly popular too but that dried up once the stores discovered that they were having a hard time selling them.


That doesn't validate it for users though, only that other businesses believe it has traction and value.


One of my investment principles is to "always bet on laziness"

VR goggles are the opposite of laziness for me.

Too much effort required to set everything up. Why do that much work for little reward if I can just sit back in my couch with a remote and play a game?


> VR goggles are the opposite of laziness for me.

> Too much effort required to set everything up.

Which is why Oculus Go is being a (moderate) success. It's quick and easy to grab and use. It has its problems, sure, lack of 6dof in headset and controller, and lack of content, but we'll get there soon I think.


It’s still very close and amazing imo. I've been non stop on my oculus go since I bought one.


I think you're right overall. That said, VR seems to be a great fit for certain types of games, especially if they've been designed from the ground-up for VR.

As an example, I'm not sure if playing Civilization 6 in VR would be so earth shattering that I'd go through the effort (though who knows?). On the other hand, for a game like Elite: Dangerous, where essentially it's all about immersion in the experience of flying a spaceship, VR could / can be awesome. Even a simple head-tracking device can transform the setting from "hey, I'm playing a game here" to "hey, I'm sitting in a space ship here".


Oculus Go has zero setup.


I don't think Carmack was speaking to any kind of content in particular (canned or otherwise). This was more of an update on the current state of of the technology.

In any case, I think it's early to say what's a dead end. As you say, there are no killer apps yet. Even VR porn is still at a demo stage. It all might be a dead end, if you want to go on existence proof alone.

Honestly, I was expecting one of the VR platforms to attract simple applications/games. Resolution doesn't matter much if you're looking at stick figures.

If big, epic content is what it will take to get VR to that "real product" stage then I think the platforms will have to produce it. I could imagine "canned" content similar to old Imax movies working well, but that is not a 2 guys and a camera type of job. Who would put up that kind of money, for such a small audience, when all the first mover benefits go to the platforms.


VRChat for Steam currently has 6k concurrent users, with peaks of 20k during holiday periods: http://steamcharts.com/app/438100


I wouldn't rush to call VR market failure, VR sales are steadily increasing.

What a lot of people don't see are the iconoclastic milestones that are coming up, that will transform the entire domain in terms of market acceptance and saturation.

Foveated rendering is the first of those. Hand-finger tracking (e.g. a glove interface) is the second.

These technologies are not science fiction, we know they are coming out since lab prototypes already exist. Moreover, impact-wise, these technologies are not really incremental improvements but complete game changers (thus iconoclastic). The VR landscape in 5 years will look completely different to today.


I just got the new Oculus Go and it is a game changer. Cheap, made for mass consumption and impressive as hell. I thought it would be another cardboard but no, it’s the revolution we were all waiting for. I’m writing a blogpost about my first game of Catan online, playing a board game with two strangers from different place in the world. It was insane, an experience I’ve never had before.


VR didn't fail. . Virtual reality itself is software. VR hardware is merely an interface. The headset display is going to be replaced by something better. There are many ways to provide virtual reality, without the flaws of headsets. We're going to look back at headsets the same way we see CRT monitors.


We already look at them that way - heavy and obsolete.


The problem is all the great stuff is on Vive and Rift which is inconvenient to jump into. Then you have Oculus Go which is convenient to jump into, but there's not much great stuff due to lack of hands (crippled interactivity), and will get shelved with little re-engagement.

The Second Life continuations aren't the best examples to measure VR's traction. I'd look at user re-engagement of apps on the level of Bigscreen, Rec Room, Beat Saber, Google Earth. Those apps get people back into the headset and capable of logging 1000+ hour playtimes. Once high-end capabilities (6DoF head / hands) meets convenient form factor (standalone, wireless), I expect to see VR gain wider traction.


You do have a controller for oculus go and it works quite well if you're not moving.


It's only one controller that can only wiggle.


you'd be surprised how much you can do with it. Try Virtual Virtual Reality, it just feels like you can freely movie it. The illusion is strong.


I've tried it all. VVR was okay for like a few minutes. I didn't feel any illusion, flipping a pancake and such still felt like a wiggler. It's not just "feeling" like being able to move it, 6DoF enables an entire class of interactions that is impossible on 3DoF (e.g., having your hand doing something even if you're not looking at it).


I bought the devkit, the poor support on Linux was enough for me to pretty much abandon it. I'm not developing on Windows, I've made it this far and will instead target other platforms.


Hardly anything ever had good support on Linux. Linux has its strengths, but Windows is a far better platform for graphics-related development.


Thats just not true anymore. Even AMD has great open source drivers on Linux that you can just dream about on Windows.


Unreal Engine, including all the dev tools, runs fine on Linux.


I got a ps4 vr for christmas. Had to return it. I wanted sooo much to like it but the stomach nauseous and fogged lenses just happened to quick and often. I returned it.

I just don't want to wear VR stuff I'd rather get something like from lawnmower man or something without all this stuff strapped to me. Is anyone doing anything like that?


Look at the bright side. At least if Oculus fails (or rather when Facebook will finally get the clue that is has already failed), Carmack will be right there making Facebook's shadow trackers extremely efficient, and if that's not a worthy mission for the end of one's career, I don't know what is.


Agree, but any hope for AR? Apple's been plugging away, and Magic Leap keeps getting investors...


The same I heard about smartwatches, and it was true until they went for that niche of fitness tracker. Its an overhyped niche, that we agree on


Unless we have brain implants then VR is not what people want.

I certainly don't want to be wearing any kind of apparatus that distinguishes me from a normal individual.

If we do have brain implants I don't want advertising and tracking.

I would pay for a direct internet connection to my head only if I know it is anonymous and I control it.

VR in its current state is pretty lame if you like the real world too... imho; and it isn't going to take increased resolution to fix it.


Do you feather the edges of the high resolution videos to blend with the low resolution background? I found this necessary when I implemented something similar because otherwise the borders are far too obvious, though it feels bad to waste those pixels at the edge of the video.

How are the videos stored? Is each gop a separate file or do you seek around in larger files? Is it feasible to stream over HTTP or is it only possible to play local videos for now?


I was prepared to blend the edges, but it turned out not to be necessary. If the compression ration was increased enough that there were lots of artifacts in the low res version it might be more important.

I was originally going to put it into an mp4 file with the base stored first, so normal video players could at least play the low res version, but the Android MediaExtractor fails when presented with more than 10 tracks, so I just rolled my own trivial container file.

Peak bitrate for Henry is around 40 Mbps, so it wouldn't stream for most people. With some rearrangement of the file so each strip has a full gop continuous, instead of time interleaving all 11, the bitrate wold be cut in half, but it would still be a lot of fairly small requests, so it would call for pipelined HTTP2.


Ah, so all the frames for every strip are interleaved in your container, and you just read sequentially and ignore the frames you don't need? That's probably the right thing for local videos.

I had each gop in a separate file like HLS or DASH (except for the background which was a single file also containing the audio track). It's unwieldy but makes HTTP streaming a little simpler because you don't need range requests or an index.

Also, instead of bitstream hacking to stitch three strips into one, I encoded multiple strips into "pre-stitched" views. This means that every strip is encoded redundantly in multiple views, bloating the on-disk video size. But for streaming that only affects the server, not the client, and it's nice for the client to only download/decode one view at a time (plus the background) instead of three strips. Bitstream hacking to join the strips would definitely be better if it can work, though.


Finally a John Carmack post that isn't on Facebook! Great read too, can't wait to see more.


I think the distiction is between personal experiences, and activity on projects, requiring review, to determine the disclosure of information which may or may not reveal trade secrets.


It mentions "...5120 pixels across 360 degrees is a good approximation. Anyone over this on current VR headsets is simply wasting resources and inviting aliasing."

How does increasing the resolution cause aliasing? Shouldn't it be the opposite?


Today's VR Headsets are low resolution. If your video source is higher resolution than the display then it must be downsampled. Downsampling will introduce aliasing artifacts unless an anti-aliasing pass is preformed prior to downsampling.

> Image scaling can be interpreted as a form of image resampling or image reconstruction from the view of the Nyquist sampling theorem. According to the theorem, downsampling to a smaller image from a higher-resolution original can only be carried out after applying a suitable 2D anti-aliasing filter to prevent aliasing artifacts. The image is reduced to the information that can be carried by the smaller image.

- https://en.wikipedia.org/wiki/Image_scaling

In an ideal situation, your source resolution would match your target resolution. If it doesn't then you must expend resources scaling. VR can't leverage the 2D scalers found in most hardware since the source data is being mapped into 3D space. If you have to downscale your source then on top of the processing resources being used to scale in 3d space, you're also wasting bandwidth delivering those resources at a higher resolution.


I really don't like the idea that it's good enough now since the headset block us. Technology innovation can come from every area. If others make better 360 cameras or players and exceed the 5K experience, they should not considered as "wasted" just because one kind of display headset doesn't meet the same level. I do like Oculus Go for its nice price and acceptable experience. But there are quite some headsets that perform better, at least capable to display 8K or even above, Vive Focus, Gear VR with S8 and S9, and the upcoming Pimax. So, please don't block others and ourselves on technology innovation! Eventually if we take human eyes as the ideal benchmark, we need to achieve 16K!


I don't think he's advocating that we stick to 5k but rather that 5k is good enough for today's devices and as they improve we can up the resolution.

Advocating for higher than necessary resolutions today based on future prospects is a bad gamble because consumer adoption is still in it's infancy. What if you make a poor user experience today in anticipation of a better experience tomorrow but instead kill the market so there is no tomorrow?

Video game developers and video content providers are both well versed in dynamically scaling to match the capabilities of the consumer.


Eh is it wasting memory bandwidth, sure, but let's not pretend a modern smartphone GPU is breaking much of a sweat rendering some 5k textures into a >x framebuffer.


If the video is streamed it's wasting internet bandwidth as well. If not then device storage.

High end mobile GPUs can handle 5k video at the expense of battery life but most GPUs can't.


Aliasing happens only when the signal that you are sampling has higher frequencies than what you can represent in your sampled data. So he is absolutely correct here: if the source material has higher resolution, the projection onto the lower resolution display would lead to the same kind of undersampling. Basically, you'd miss in-between pixels and if these have significant content, it becomes very noticeable (e.g. parts of very thin features missing from the image).


He did not say it causes aliasing, he said it is inviting aliasing.

More resolution is going to be fine if the pixels are filtered well. If they aren't, that could certainly invite aliasing. Pixel filtering, especially in a scenario like this, is going to have a wide spectrum of techniques that can work to varying degrees, with trade offs of quality and speed.


A pixel filter that takes into account the highly nonregular shape of the projected screen pixels and the warped projected video pixels in the source at the same time would be quite an interesting task.


I think the point was mostly if you are filtering to downsample, you should probably not have downloaded the bytes or have wasted the energy decoding.


The question I was replying to was about aliasing.


He talked about equirectangular projection. I wonder how he is achieving that, with what kind of shader, I asked the question once, and it seems you cannot do that with a vertex shader, but maybe a geometry shader?

I already read articles about the pannini projection, which is quite cool, but I guess occulus is facing the same issues and has similar solutions...


Any shader.

You can render a single quad and calculate UV in the pixel shader. Geometrically, this is the most accurate method.

You can do without any shaders, write a CPU code that builds a reasonably dense (about 8x8 pixels/triangle) static geosphere mesh with position and texcoord channels.

You can also build a reasonably dense grid mesh occupying just the screen with just the positions in clip space, and calculate UV channel in the VS based on the rotation.

GS will work too but I’m not sure it makes sense performance-wise. People normally use GS-es to implement small dynamic objects. For static geometry I won’t be surprised an immutable buffer will be faster, just because that’s what hardware & drivers are optimized for. See e.g. this http://www.joshbarczak.com/blog/?p=667


These are the steps for transformation that may answer your questions:

1. Cubemap (source) - 360° videos are often sourced from six different images stitched together.

2. Equirect (video) - the cubemap is transformed to "equirect" trivially with a fragment shader—which maps every output pixel to some input pixel.

3. Perspective (output) - Mapping equirect video to a perspective projection is also done with a fragment shader, just using a different transformation depending on the focal point of the user.

4. Pannini (alternative output) - You mentioned this projection, and it's an alternative to the "perspective" projection that allows a wider output FOV that minimizes distortion in the periphery.


The transformation from cubemap to equirectangular projection seems unnecessary. Why convert? There's support for cubemap textures on modern hardware, and having a seperate video stream per face would make culling them easy. It wouldn't be as fine-grained as Carmack's end result, but it would be simple and avoids resampling.


How would you achieve a 170 degrees pannini projection in real time using opengl? Do I really need a cube map?


You can use maps containing fewer faces than a cube, or any partial mapping that covers the 170 degree area you need.

I researched this a bit here: https://github.com/shaunlebron/blinky


It's no worse than the problem of stretching a texture over a sphere, using UV coordinates. It's doing it at this data rate that's new.

Mandatory XKCD: [1]

[1] https://xkcd.com/977/


This might not be possible if the "system software" doesn't cooperate, but it's possible to encode videos such that you can keep them "warm" without decoding all frames, for example in a IBBPBBPBBPBB structure where all B-frames are not referenced by any other frame (other arrangements are possible). Forcing this structure has a cost, but it's much smaller than having more I-frames. You can then alternate decoding 3 such streams (each one offset by 1 frame, including the I-frames - this is not a problem, is just means you'll not be ready to output anything for 2 frames after a seek) for the cost of 1. Switching to 60fps is then "instantaneous". Old iTunes used to code h.264 video like this (with a PBPBPB structure, so it could play at half-rate, which it did if the CPU couldn't keep up). Note unreferenced does not imply B-frame nor the other way around.

Another (admittedly crazy) idea, for a setup with a lower-res version and a higher-res overlay, is trying to store the difference only, affording a (significant?) bitrate reduction for the high res "patches". This is very tricky to do in practice, though (needs larger range or losing the lsb; the codecs aren't really designed for this). I don't think it has ever been done.


I always wondered if HTM spheres are a worthwhile solution for lower sized textures. You start with 8 equilateral triangles that make up the sphere and subdivide as needed. You could possibly use a square texture made of 8 right triangles that each need minimal distortion (as opposed to an equirectangular texture).

https://arxiv.org/ftp/cs/papers/0701/0701164.pdf


The WorldWide Telescope uses this exact idea: http://www.worldwidetelescope.org/docs/worldwidetelescopepro...

There are also other interesting spherical tessellations that could be used - one is implemented in Google's S2 geometry library, the other, HEALPix, is used for astronomical datasets:

http://s2geometry.io/resources/earthcube http://iopscience.iop.org/article/10.1086/427976/pdf


Alternatively, you could maybe use a cube map. You would still have a bit of distortion, but that could be handled during the video generation anyway.


Distortion visible in cube maps goes doubly so when viewed in VR. Cube maps are easy but not optimal.


What's the source of the distortion?


And you could use ptex textures to have a regular pixel density.


Another approach to space efficiency would be to do away with the need for dual video streams entirely and just average the stereo images together to form a single monocular image. Then, send a disparity map along with the monocular video. Decode the mono video and use the disparity map to interpolate the view of either eye. You’ll have all the information you need for reconstruction and the disparity map can be efficiently compressed via normalization and perhaps even by sending just the vectors of the contours.

Another idea is to take advantage of the fact head motions are really just a translation vector of the camera. There’s no need to send pixels that have just transposed locations unless they have changed in time.

If I was designing such a system I’d try to take advantage of the fact there isn’t a lot changing fundamentally in the scene when you move your head, and maintain some sort of state and only request chunks of pixels that are actually needed. You wouldn’t even have to use a traditional video codec as the preservation of state would be far more efficient than thinking about things in terms of flat pixels and video.


> Decode the mono video and use the disparity map to interpolate the view of either eye.

By "disparity map" are you thinking something like a heightmap applied to the scene facing the viewer and then you use that to skew things for each eye?

If so, how would that handle parts of the scene that are occluded/revealed to one eye but not the other?


How does video encoding like H.264 handle parts of a scene that are occluded in one frame, but not occluded in the next frame?

A three inch difference between two cameras producing simultaneous frames is similar to a three inch sideways step of one camera in time between two frames.


True, occlusions would be a problem but we’re taking about fake autostereoscopic 3D here, where most of the stereo rigs used for capture have but a modest baseline. Almost all of the depth perception comes from disparity, occlusions would still in fact be very visible by the averaging method I described and be at whatever depth plane of the occluder which is probably a good guess anyway. Not like your other eye would receive a correspondence from an occlusion in the real world.


FYI, there's online software[1] to recreate 3D/stereoscopic 3D imagery from the depth-enabled photos taken e.g. by Moto G5S (which has a dual-camera setup that computes the depth map, but no API to extract/store the image taken by the other camera).

My personal opinion is that true stereoscopic images feel better when there's enough detail; those occlusions do matter. For some imagery it doesn't matter as much though.

[1]http://depthy.me


First of all, the process of converting a stereo pair into a flat image and a disparity map would be lossy and introduce artifacts. Even assuming you could accurately capture pixels that were occluded from one viewpoint and not the other, the approach is inherently unable to handle effects such as partially-transparent or glossy surfaces.

Secondly, the limiting factor described in the post is not space efficiency, it's decoding performance. It doesn't do much good to halve the amount of data required to represent a frame if it takes twice as long to reconstruct the raw pixels for display.


The difference between two images can be non-lossy, if you like.

> partially-transparent or glossy surfaces.

It's all just RGB values; there is no gloss or transparency in an image. (Image layers can have transparency for compositing, but that's obviously something else.)

If audio encoding can have "joint stereo", why not visual coding.

Many areas of a stereo image are nearly identical, like the distant background.


I’m sorry for not detailing a perfect compression method in a single HN post. I’ll do better next time, promise. lol


Yeah, there's a lot you could do, but unfortunately the only thing that makes decoding high resolution video feasible on these devices is fixed-function video decoding hardware which can't support new ideas like this. You'd have to lobby standards bodies to add VR-specific features to codecs and wait many years for hardware to implement them.

Ultimately what you want for VR is some kind of light field video compression. You can get a taste for what that would be like here, although it's mostly still images for now: https://store.steampowered.com/app/771310/Welcome_to_Light_F...


I don’t now if this assumption is true ‘the fact there isn’t a lot changing fundamentally in the scene when you move your head’


What you are suggesting is essentially a new codec. It sounds like a good idea, however, that's a thing for the future.

The "disparity map" you suggest seems to exist in 3D Blu-rays (Multiview Video Coding), but there may be some technical limitations that make it unsuited for the Oculus Go.


Sure, those ideas might hold up for objects far away from the eyes, but for nearby objects there can a pretty big difference between what each eye sees. I think the human brain would quickly call BS on an image processed through that kind of compression, and it would not be very immersive or realistic.


There is a difference between what the successive frames of a video depict. Yet, video compression heavily relies on encoding just the differences between successive frames, which is very effective.


I think this more-or-less corresponds to the first approach you described. It's part of the Multiview Video Coding amendment to H.264: https://en.wikipedia.org/wiki/2D_plus_Delta


Would this be similar to joint-stereo audio encoding?


(Much hand-waving here)

I wonder if the problem isn't a misaligned paradigm for just what a codec does. Right now, codecs exist to take bytestreams and fill frames at a certain rate. They're a bitstream-to-framerate device.

What if instead of delivering to a frame during a certain time period, the codec had to instantly deliver whatever it could, but only geared to the actual acuity of the users retina. Seems like there would be less information, you would never have lag, and all optimization would be around the detail of data delivered, not framerate or screensize.

I also birthday parties and Bar Mitzvahs, folks. I'm here all week. (I figure it couldn't hurt to throw crazy ideas out there. Every now and then a crazy idea actually amounts to something. Coding is cool because it not only let you solve problems, sometimes it lets you change the universe the problem lives in. Good luck, John!)


only delivering the visible portion of a 360° video assumes that the video is ready to be decoded at any angle, which is the entire problem.


But the problem is to display something for a moving angle, not display something for everything. As best I can tell, it is a different problem.

I want to restate what I'm hearing so you can correct me. You are saying something like "Gee, if we could show any retinally-matched conic the user's pointing their eyeball at? To do that we'd have to have the entire scene rendered anyway"

I'm saying you can't do it now. It hurts when I do that. So stop doing that. Instead of trying to render stereo scenes quicker, try to render a moving conic real-time quicker. When the codec runs, it's optimizing areas of a frame changing over time. If instead it optimized possible vision movement paths, you'd end up solving the framerate and resolution problems. Then you could concentrate on optimizing the codec in a different way than people are currently trying to optimize it. It couldn't hurt, and there may be opportunities for consolidation if you look at the problem as visual-movement-path-rendering instead of frame rendering. I don't know. I just know it's a problem now. Set your constraints differently and optimize along a different line. Sometimes that works.

Does that explain the differences here? Or do you want me to start spec'ing out what I mean by retinal-path codec? At some point this will a bit over the top.

ADD: I'll say it a little differently. Our constraint now is "how fast can you render this frame". Codecs are built for rendering frames on screens where people watch movies and play games. But that's not the world we're trying to solve a problem in. They may be configured the wrong way. Instead, require that anything the eyeball is looking at should render consistently in, say 5ms.

This sounds like the same thing, until you realize that the eyeball can't look at everything at the same time. Different parts of the image are temporally separated. So if I update the image in back of your head once every 200ms? You're not going to know. As far you're concerned, it's all instant.

It becomes a different kind of problem.


You want to predict focal paths and encode video for them ahead of time?


I think we're close now.

I want to predict the temporal cost of focal path movement, then optimize the stream based on those temporal "funnels" I guess you'd call them. This is in opposition to looking at segments of the screen all equally and optimizing across frame changes. I don't care about frame changes. All I care about is how fast the eyeball can get from one spot to another -- and that's finite. If I can create the image I want instantly wherever the eyeball is looking, framerate and resolution issues are a non-sequiter. (This would also probably scale to more dense displays easier, but I'm just guessing)


The slice approach should be pretty readily doable today with e.g. libx264 (which you can force exactly N rectangular (full-row) slices with, using i_slice_count), and calculating N based on the resolution of the eleven vertically and horziontally stitched clips and their boundaries. (And with VP9 using some craft tile-column/tile-rows setup, maybe...)

This is assuming the videos are stitched pre-encode, of course... From the post, it almost sounds as if the idea would be to stitch independent H.264 streams into a new unified one using slice mangling + slices... which would be pretty crazy stuff.

(As a side note, it's a shame flexible macroblock ordering is only in the baseline and extended profiles... I still don't understand that decision at all.)

EDIT: Dawned on me that the hard part is on the client/decode side. D'oh.


If you have the keyframe always at the seek time, would that be enough? Because you said the delay is too long?

If yes, would it be possible to render, for every second (half second or whatever resolution you like) a snippet which only contains the seek time until the next keyframe and than tell the webserver 'if someone seeks to time x, take this snippet and when done, jump to the original video'?

You could do that transparent with some fuse fs.


John Carmack's attention to quality and detail is incredibly inspiring.


Legitimate question: are there “untouched nature” type simulations for one of the leading VR platforms? Sitting at the top of a mountain, different times of the day and weather. Exploring a huge, very realistic forest. Being a pigeon observing a busy street from above as you fly around and perch yourself on different spots.


I find it interesting that you went in a different direction than the earlier ideas discussed for this problem. I can see where this is simpler in many ways and probably more importantly a better fit for a lot of content than some of those early ideas.

You mentioned the overhead for handling many streams being a problem. Can you go into more detail on the type of overhead you saw and do you feel this is something can can be addressed with better drivers or some application changes? Or do you think the only solution is packing multiple stripes into a single stream using the encoding bit manipulation you mentioned? It seems a shame for hardware that can support more streams leaves them inaccessible to real world applications.


Fascinating. Ammeture question though: are there other ways to encode the images besides pixels that will ultimately be better suited to 360 content? Holograms came to my mind because they degrade gracefully as you toss out bits of the signal


Lytro's light-field camera technologies have huge potential for 360 content and the photogrammetry industry, but they effectively sat on the technology doing very little with it (compared to their competitors), until Google hired some of the employees as the company shut down earlier this year.


"5120 x 5120 at 60 fps" here is why that's overkill. If you start putting dots on a sphere you can go in one direction turn 90 degrees, put 5120 another axis. However, if you tile like that with 5120 x 5120 you get a lot of wasted pixels at the poles.

If a 5120 pixel Radius is fine... then Radius of sphere = 2 pi * r, and Surface of sphere is 4 pi * r^2. So 5120 / 2pi =r substitute for r > 4 * (5120/2pi)^2 simplify > (5120)^2 /(pi) ~= we need ~1/3 aka 1/pi of 5280^2.

However, I suspect you actually want more than 5120 pixel radius.


In the center of the lenses, the circumference resolution is a bit over 5120, but definitely less than 5760. Even at 5120, it is a bit overkill (and potentially aliasing) at the edges: https://twitter.com/ID_AA_Carmack/status/975198157838499840

You are off by a factor of 2 in your pixel calculation, because 5k x 5k is for a stereo pair of spheres. Equirect projections waste a fair amount, but compared to the 300% miss to get to 60 fps stereo, it isn't dominant.


Ahh, good to know.

Anything you can do about someone tilting their head? That feels like the biggest remaining immersion breaker with pre-rendered stereoscopic videos.


Need some form of RGB+D for that. I have a player for that, and sometimes it looks fantastic, but the silhouette edge artifacts can sometimes look really bad. Considering some ways of "relaxing detail around the edges".


Would we ever see a dynamic resolution that tracks where you are looking at and just lowers resolution outside the point of interest? Wouldn't that save a lot?



I thought I was smart for just 10 seconds of my life... thanks for the link


In the mid-90s I worked on a system that did that to maximize image processing power, CCTT: https://www.youtube.com/watch?v=HY5M1jM5ggw


The article is about doing that for pre-recorded video. For live 3D VR rendering, it's already in some game engines. Batman Arkham VR uses it (it's in the settings). More info https://www.youtube.com/watch?v=gV42w573jGA


> If you start putting dots on a sphere you can go in one direction turn 90 degrees,

Starting where? Turning where?

> However, if you tile like that with 5120 x 5120 you get a lot of wasted pixels at the poles.

You didn't mention tiling.

> If a 5120 pixel Radius is fine... then Radius of sphere = 2 pi * r, and Surface of sphere is 4/3 pi * r^2. so 5120 / 2pi =r => 4/3 * (5120/2pi)^2 => (5120)^2 /(3 pi) ~= we need ~10% of 5280^2.

More details on what you are trying to say at each step please.


Quick edit, it's 4pi r I was thinking volume for a second there.

As to tiling, I am saying if cut a sphere into 5120 slices, and the middle slice is fine with 5120 pixels. Then the poles are going to have ~1 pixel on them. How you tile them is really a question of tradeoffs.


I think you should draw a diagram and link it.

> As to tiling, I am saying if cut a sphere into 5120 slices, and the middle slice is fine with 5120 pixels.

What kind of slices? Like sections of an orange? Like longitude?


I was thinking lines of latitude.


You don't need 5120 (for 360 degrees on X-axis) x 5120 (for 360 degrees on Y-axis) do you? If you cover 360 degrees in the X direction, then your Y-axis only has to be 180 degrees, because you don't need 'behind you' on the Y-axis as that is already covered by the X-axis' wide 360 degrees. Or am I making an error here?


It's stereo, so Carmack means 360°x180°x2 for two eyes.

One would assume there would be a lot in common between stereo cameras' views, so presumably there are compression efficiencies to be found.


> Directly texture mapping a video in VR gives you linear filtering, not the sRGB filtering you want.

Isn't this back-to-front? Generally filtering is considered to work better in a linear colour space. Using a sRGB texture will convert each pixel to a linear colour space, before the reconstruction filtering is done, AFAIK.


Not mentioned in article, but Exynos versions of s8/9 don't need these tricks.


No, Exynos has the same block limit as the snapdragon chips -- 4k60. The difference is that Exynos doesn't have the same 4096 maximum dimension limit, so it can do 5120x2560 (monoscopic) at 30 fps, while snapdragon can only decode 4096x2560 at 30 fps. The view dependent player is about playing 5120x5120 (stereo) at 60 fps.


It can play a HEVC file that is 6144x3072 at 60fps.


I just tried, and while it does decode a 6kx3k 60 fps video, which is very admirable, it doesn't hold 60 fps while doing it. There are probably encoding options to minimize work on the decoder that could let you push it a bit more. MediaExtractor seems to be arbitrarily limited to a lower resolution, but that can be bypassed.


Can you try with skybox vr? At least on my note 8 Oculus video really doesn't like h265 video that's more than 4k.


I agree with you.

VLC on the Note 8 (and the S8) can do 8K decoding at 48fps, as is demoed here: https://vimeo.com/254723180

We've spent quite a bit of time to get that right, though, and those are rare devices (and does not work with the Qcom version).

Maybe that's one video, with low bitrate (5Mbps), but it worked.


Retina has 100 million photoreceptors per eye. Also, need higher FPS. Maybe 120 is good enough?


Most of the need for high frame rate is simple lag reduction during head motion. Sensitivity to frame rate and flicker have been extensively studied by SMPTE and others over the past 100 years. The research that went into 24 fps playback for movies (projected in darkened rooms, where the eye is less able to detect framerate induced flicker) and 50 or 60 fps playback of interleaved video for TV (in well lit rooms where the visual sensitivity to flicker is higher) is quite good. The work that went into measuring human visual acuity for using in defining HD broadcast specs was similarly good, and the human eye hasn’t changed much in the past 100 years (except to end up with weaker muscles for shifting between near and far focus). Simply rattling of $BigNum or %maxInt% as resolution and framerate targets isn’t the way to set standards.


It's hard for me to go back to a 60hz display after using 144, and going from 240 to 144 feels like going from 144 to 60.

I played breath of the wild and it was almost bad enough to keep me from playing. I think 60fps is fine for this, but in the future hope to see higher refresh rates as an option.

My main use case from VR has been escaping reality. Having multiple monitors floating in space allows me to prevent distractions. I fell asleep in VR once. I woke up looking at the stars, it was very confusing but interesting.


To add my anectdata, you don't happen to be diagnosed with something like ADHD-PI/ADD?

Almost everyone I've met or talked with that can notice the difference between those kind of framerates seems to either be diagnosed with it, or at the very least clearly show the associated traits.


I do, actually. Along with HFA. I also play games at a competitive high level, which can contribute to it. (top 10 pubg in a season, a+ esea CS, diamond siege etc.) It's so common in these communities to have at least 120hz which makes the most sense if you play CSGO ESEA. I wish Siege would improve interp and go to 120hz instead of 60. It's atrocious the things you die to.

I think most people I associate with are diagnosed ADD, but they're also high skill gamers.


Play, seriously, a competitive shooter/fps and you'll also start noticing the difference. 144hz is buttery smooth compared to 60hz.


What you say is correct for some, but not all people. In FPS games many don't see a huge visual difference between 60Hz and say 140Hz, except for latency.

That's not true for everyone. For me and a few others, there's even a clear difference between 120 and 144 Hz, and to 60Hz it a gulf. A Refresh rate of 60Hz in a high paced, close up, fight becomes a bit of a where's Waldo situation. It's as if my brain stop doing motion estimation because the frames doesn't really fit together, so I kind of stitch together the individual frames and try to guess what next 'slide' is going to show up.

I know there are a few other with the same issue, and anecdotally it might be a trait that occurs with some forms of ADHD/ADHD-PI. But I don't have a good reference on that, only experience and people I've met.


Correct me if I'm wrong, but isn't the difference between 60Hz and 120Hz refresh more or less imperceptible to the average human? I'm sure there's a distribution, but I'd be hard pressed to find a person who could differentiate between 100Hz and 120Hz refresh. It seems like a waste to push rendering beyond the point at which we can even tell there's a difference.

Edit: Thanks for the feedback, I guess it is perceptible. Nevertheless, I think my argument becomes valid at some N. Sure, N !== 60, but N = 144 or 120 may be more reasonable. I'm not too concerned with what N is, more so with the fact that "doubling the refresh rate" eventually becomes an act of futility.


When discussing framerates or refresh rates people tend to make a mistake of not differentiating between interactive and non-interactive mediums. In a video it's just a matter of smoothness. In a video game there is a more important aspect of input delay and the overall motion-to-photon latency, which is greatly affected by frame times.

There is also a misconception of treating visible motion details and temporal artifacts as the same thing ("what human can see?"). There are diminishing returns around 100 FPS for motion details, but it's still far too low to eliminate artifacts like blurring or judder (in VR usually connected to vestibulo-ocular reflex). This is why current VR headsets already have effectively 300+Hz-like persistence. We may need something like 1000 FPS if not more to achieve clean vision at full persistence, so obviously strobing tricks are necessary to get around it. And you don't need a fighter jet pilot pilot to see it. Everyone can notice these problems.


It is definitely perceptible. There is definitely a falloff in value/cost. In the most extreme case, your eyes can tell the difference between a solid-lit LED vs a 1000Hz strobing LED in a dark room if you dart your gaze left and right.

The most important quality issue is for the image update to match the display strobe. That's why FreeSync/Gsync is such a big deal.


60 and 120 are worlds apart, especially in VR. I would be shocked if many people with healthy mind and vision couldn't see the difference after having the basic idea explained to them. In VR some people might even physically feel the difference.

Doubtless there is some refresh rate where it ceases to matter, but the point here is that we're still struggling to push an acceptable frame rate in VR (and 4k, and other high resolution "formats") without making significant sacrifices in other aspects of the video quality. Assuming that image quality and available processing power both continue to increase, it will continue to be important to include framerate in the balance, i.e. we want to draw enough but not be wasteful.


I find it very perceptible. Get access to a high refresh monitor and visit the ufo motion test[0].

Even on your current 60Hz monitor, you can see how the image is blurry in the 60Hz band. On 144Hz I could see the pixel-level details (on 1080p) almost as clear as a motionless image.

[0] https://www.testufo.com/


In regards specificly to the UFO test... The blurriness is generally not related to frame rate but LCD persistence. Ie, on a theoretical zero persistance 60hz display you would not likely notice the individual frames getting blurred from frame to frame. In this case the only real difference between a 60hz and 120hz display is that the 120hz display would have additional frames of motion. Your vision system would then has less frames to interpolate between.

In reality though, all LCD displays have persistence issues. (You end up seeing bits of the previous frame suspended over the current frame).

Higher quality monitors designed for higher display frequencies tend to also be tuned to have less persistence per frame. There are also tools/tricks some brands employ to remove / reduce the persistence (and blurring), which can be effective (or annoying, depending on how they do it) even at 60hz. There are also "120hz" and higher displays with such bad persistence issues that they are way worse off on the ufo test than a good low persistence 60hz display.

As an easy example. IPS panels tend to have a lot more trouble switching between frames quickly than VA or TN panels, so they also tend to exhibit more persistence issues per frame. This is then directly noticeable on the UFO test between these kinds of panels. IPS panels have of course been getting a lot better in this regard in recent years!


Depends on the context and how involved the person is in the experience.

There has been a bunch of research involving VR induced nausia indicating that having a framerate of 90 or higher reduces the incident of nausia over 30 and 60.

For example: https://www.researchgate.net/publication/320742796_Measureme...


Buying a 144Hz display was both a great and horrible decision for me. Great because games at 144Hz feel extremely smooth. Horrible because now I can't enjoy 60Hz.


60-144 was insane the first time. Then I went to 200 hz ultra wide, back down to 144, then to 240. 144 feels like 75 now, and 60 feels like 50.

Its painful to go back.


144-240? This sounds like a Placebo effect. While I'm sure people out there can distinguish between 60 and 120[0], it seems you must be incredibly attuned to visual sensory input (an outlier). It seems you aren't alone, either[1], part of me wants to think this is one big marketing gimmick, but surely people wouldn't buy 240Hz monitors if they couldn't tell a difference? Or is it just a pissing contest for self satisfaction? If I had a few million in the bank, maybe I would also buy a 240Hz monitor -- because why not?

[0] https://www.pcgamer.com/how-many-frames-per-second-can-the-h...

[1] https://www.reddit.com/r/Competitiveoverwatch/comments/5mhqh...


Look at this [0]

It's an interpolation thing; it's much easier for your brain to do the tracking of an object when it smoothly moves around, instead of it having to do interp / extrapolation, and this should demonstrate why

Can I ask why you think it sounds like a placebo? I don't really see why there's any logic behind 144hz being the ceiling of how well your eyes can see, and 144hz -> 240hz is a big jump

[0]: https://www.youtube.com/watch?v=pUvx81C4bgs


You can tell a difference between 144-240. A lot of small differences, but there is a difference. I just wish there were a nice high-tickrate game to play other than CS.

144-200 was not very noticeable. 144-240 was absolutely noticeable. A couple of games that dual 980 ti's cant reach 240 in currently, hopefully Volta changes that. The monitor was also on sale, so I got it for a crisp $300 with no tax.


The 120Hz display on my iPad Pro gives me the same feeling, and I don't use it for anything important.

Using my iPhone immediately after makes the phone display feel cheap for a bit.


Is there any public research about the inverse correlation between framerate and nausea? If there is a perceptible one in that range, it would mean that higher framerate does make a difference, even without being aware of it. This would only be worthwhile for VR.


It is definitely perceptible. Compare a normal iPad (60 Hz) to an iPad Pro (120 Hz), the fluidity of movement is very apparent in just playing around with the home screens.


It may be perceptible, but the number of factors that could affect the different systems in that comparison means it's not exactly a good test. At the simplest level, there's no guarantee it's actually even rendering updated frames at the rates in question if it's limited by some other factor, and the differing hardware may change at what point that limit is hit.


You can drop frames at 60Hz ss well.

But really, when you achieve 120Hz, it’s beautiful, it reminds me of when retina displays came out. We are a bit closer to realistic rendering.


> You can drop frames at 60Hz ss well.

Yes, what I was trying to get at is that just because the hardware is capable of 60 frames in a second, that doesn't mean the software was delivering 60 frames a second. The iPad Pro has a different processor than the iPad (A10X Fusion vs A10 Fusion), and in a lot of tests it's significantly faster.[1]

The iPad pro does have more pixels to push around, but that doesn't exactly negate the CPU difference, it just makes it more complicated to draw an actual comparison. That that's before we even get to the actual graphics processor, which itself could do a better job of offloading some processed to hardware (better OpenGL/Metal/whatever support). For all we know, you were seeing a average of 35 updated frames a second on the iPad, and you're now seeing an average of 55 updated frames on the iPad pro. In that case, the doubling of the screen refresh might help a little (in reducing noticeable laggy frames a bitas it can update between what would be frames at 60Hz), but it wouldn't be earth shattering. I doubt it's that bad, but as an example, this should show how a Hz rating on what a screen is capable of doesn't mean much.

The real benefit of higher screen refresh rates is to better support different lower native refresh rates. Much video content is at 24 FPS. A 30Hz or 60Hz screen can't represent that faithfully, and will need to double some frames. A 120Hz screen can perfectly represent 24 FPS content[2], and that's the real reason screens (and TVs) ship with that refresh rate. Different media (television, internet video, DVDs, Blu-Rays, video game systems, etc) all have different refresh rates they want to deliver.

1: https://www.notebookcheck.net/A10-Fusion-vs-A10X-Fusion_8178...

2: I'm ignoring that it's often actually 23.976 FPS or something.


Yeah, it's night and day.


Human eye is practically capable of seeing around 8 megapixels, which is a similar number of pixels to what a 4K TV has (but obviously with completely different distribution). There was a great episode about it on vsauce yt channel. We are pretty blind, but saccading masking does the magic to make our eyes super efficient.


That resolution is not evenly ditributed throughout our FOV, but rather is highly concentrated in the center of our vision. This makes "similar number of pixels" pretty meaningless.


With John Carmack at the problem, this is definitely something I'll be keeping a close eye on. For people that are unaware, he's one of the original creators of Doom, and his innovations in computer graphics are legendary.


Not disparaging, but how likely is it that someone doesn't know who John Carmack and reads Hacker News? Surely the Venn diagram of those two sets doesn't have much overlap...


You're probably underestimating how popular HN is.

And how old you are (ducks) But seriously it's 2018! It's the distant future where Doom is run in emulators in browser windows.


What’s interesting is that the technique he employs here (blurring the top and bottom of the sphere) sounds like something similar to what I remember reading from "Masters of Doom".

Namely, that when he created Wolfenstein 3D (Doom’s predecessor), he basically only rendered the walls + sprites and not the floor or ceiling, because computers + graphics engines weren’t as capable at the time.


What's a doom?


One of the first games to do real time texture mapping on fairly arbitrary 3D surfaces, back in 1993 when you had to do it all on the CPU because 3D graphics cards didn’t exist except as super-expensive prototypes found in military flight simulators and the like. Full of clever tricks to make this something you could do in real time on a 486.

It was influential enough that for many years, the genre we now call “first person shooters” was known as “Doom clones”.


What exactly do you mean by "games"?


“Games” in this context is short for “video games”, which is the plural of “video game”. A “video game” is a piece of software designed to present an entertaining challenge to the person or people using it.


I am somewhat shocked Oculus is wasting Carmacks time and potential on something nobody wants (watching 360 panorama movies).


I like 360 panorama movies. Done right, the feeling of presence is excellent. VR is a niche to begin with, so I would hesitate to say that any one application (ex. CAD, gaming, 360 panoramas, etc) is the "killer app" so far since the audience is so small. As it grows (which I pray it will), we'll see definitive trends emerge.


The feeling of presence is so limited compared to realtime-rendered 3d spaces where you can move your head around from the central camera point even just a bit. Adding some artificial parallax shift to the movie frames might be enough to give it that extra oomph to feel truly immersive.


The Oculus Go doesn't have 6 DoF, and he's been working on that and phone-based VR for the last few years. Because they're cheaper, they move a lot more units! And because they're more constrained, they need more optimization attention from someone like Carmack.


> VR is a niche to begin with

I would prefix that with "in its current state".


It's been a niche thing for 30 years or so. But any day now...


Given that Carmack is CTO, I'd assume he has a fair amount of say in what he works on.


VR hasn't necessarily had its killer app yet. The hype has died down somewhat, and the slow build starts. It is a new medium, so something impressive that people can use it for, like watching a concert from the middle of the stage, is important to keep it from being all potential.


I want it. Also looks like prerequisite for other fancy stuff.


Where do you get your demand information?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: