Ray tracing in Control. Understanding the nuances / Overclockers.ua

Part 1: Vertex Processing
In this article, we'll take a closer look at what happens to a 3D world after all of its vertices have been processed. Once again, we'll have to dust off our math textbooks, get comfortable with the geometry of frustum pyramids, and solve the puzzle of perspectives. We'll also dive briefly into the physics of ray tracing, lighting, and materials.

The main topic of this article is the important stage of rendering, in which the three-dimensional world of points, lines and triangles becomes a two-dimensional grid of multi-colored blocks. Very often this process seems invisible because the conversion from 3D to 2D is invisible, unlike the process described in the previous article where we could immediately see the impact of vertex shaders and tessellation. If you're not ready for that yet, you can start with our 3D Game Rendering 101 article.

Preparing for two measurements

The vast majority of readers read this website on a completely flat monitor or smartphone screen; but even if you have modern technology - a curved monitor, the picture it displays also consists of a flat grid of multi-colored pixels. However, when you play the new Call of Mario: Deathduty Battleyard, the images appear to be 3D. Objects move around the scene, becoming larger or smaller, moving closer and further away from the camera.

Taking Bethesda's 2014 Fallout 4 as an example, we can easily see how vertices are treated to create a sense of depth and distance;
This is especially noticeable in wireframe mode (see above). If you take any 3D game over the past two decades, almost every one of them performs the same sequence of actions to convert a 3D world of vertices into a 2D array of pixels. This conversion is often called rasterization, but it is only one of many steps in the entire process.

We need to break down the different steps and examine the techniques and calculations involved. We'll use the sequence used in Direct3D as a reference. The image below shows what happens to each vertex of the world:

Direct3D transformation pipeline
In the first article [translation on Habré] we saw what happens in world space: here, using various matrix calculations, vertices are transformed and colored. We'll skip the next step because in camera space we only transform the vertices and adjust them after moving them so that the camera becomes the reference point.

The following steps are too complex to skip because they are absolutely necessary to make the transition from 3D to 2D - if done correctly, our brain will be looking at a flat screen, but "seeing" a scene that has depth and scale. If you do it wrong, the picture will turn out very strange!

It's all about perspective

The first step in this sequence is to set the scope from the camera's point of view.
To do this, you first need to set the angles of the horizontal and vertical field of view - in games the first one is often changed, because people have better horizontal peripheral vision than vertical vision. We can figure this out by looking at an image of a person's field of view:

The two corners of the field of view (fov) define the shape of the frustum, a 3D pyramid with a square base emanating from the camera. The first angle sets the vertical fov, the second - the horizontal; we denote them by the symbols α and β. That's not exactly how we actually see the world, but it's computationally much easier to work with a frustum rather than trying to generate a realistic volume of visibility.

You also need to set two more parameters - the location of the near (or front) and far (back) clipping planes. The first cuts off the top of the pyramid, but essentially determines how close to the camera position everything is drawn; the latter does the same thing, but determines at what distance from the camera the primitives will be rendered.

The size and location of the near clipping plane is very important because it becomes what is called the viewport. Essentially, this is what we see on the monitor, i.e. rendered frame, and in most graphics APIs the viewport is drawn starting from the top left corner. In the image below, the point (a1, b2) will be the origin of the plane: the width and height of the plane are measured relative to it.

The aspect ratio of the viewport is important not only for displaying the rendered world, but also for matching the aspect ratio of the monitor. For many years the standard was 4:3 (or 1.3333... in decimal form). However, today most people play in 16:9 or 21:9 aspect ratios, called widescreen and ultra widescreen.

The coordinates of each vertex in camera space must be transformed so that they all fit on the near frustum plane, as shown below:

Frustum Pyramid Side and Top
The transformation is performed using another matrix called the perspective projection matrix. In the example below, we use viewport angles and clip plane positions to perform the transformations; however, you can apply viewport dimensions instead.

The vertex position vector is multiplied by this matrix, giving us a new set of transformed coordinates.

Voila! Now all vertices are written in such a way that the original world is represented as a 3D perspective, and primitives near the front frustum plane appear larger than those closer to the back plane.

Although viewport size and viewport angles are related, they can be handled separately. In other words, you can define a frustum to produce a near frustum plane that is different in size and aspect ratio from the viewport. To do this, an additional step is needed in the chain of operations in which the vertices in the near clipping plane must be transformed again to account for this difference.

However, this may lead to a distortion of the apparent perspective. Using Bethesda's 2011 game Skyrim as an example, we can see how changing the horizontal viewport angle β while maintaining the same viewport aspect ratio has a dramatic effect on the scene:

In this first image we set β = 75° and the scene looks completely normal. Let's now try setting β = 120°:

Two differences are immediately noticeable - firstly, we now see much more on the sides of our “field of view”; secondly, objects now seem much more distant (especially trees). However, the visual effect on the surface of the water now looks incorrect because the process was not designed for such an area of view.

Now let's imagine that our character has alien eyes and set β = 180°!

This field of view creates an almost panoramic scene, but it comes at the cost of serious distortion of objects rendered at the edges. This again was due to the fact that the game designers did not foresee this situation and did not create the game's assets and visuals for this viewing angle (the standard value is approximately 70°).

It may look like the camera has moved in the images above, but it hasn't - the only change is that the frustum has been modified, which in turn has changed the dimensions of the near frustum plane. Each image keeps the aspect ratio of the viewport the same, so a scaling matrix is applied to the vertices to make sure everything fits within it.

Rays of happiness - simply about ray tracing

The announcement of gaming video cards from NVIDIA based on the Turing architecture raised many questions, and even some time after the release of the “older” models, many of those questions remain relevant. Budget video cards in this series have not yet been announced, and prices for the RTX 2080 Ti, 2080 and 2070 remain extremely high. At the same time, most of the games that could clearly demonstrate the main feature of the new generation of video cards have not yet been released ( Atomic Heart ), or support for them should appear only in the future ( Shadow of the Tomb Raider ).
In this article we will try to figure out why it was the support for ray tracing that caused such a noise in the context of new video cards, and not other innovations - GDDR6 memory, VirtualLink, NVLink and 8K HEVC.

What is the idea behind ray tracing?

The essence of the technology sounds quite simple: it tracks the interaction of rays with the surfaces on which these rays fall. Accordingly, they can be reflected, refracted or passed through.

NVIDIA presentation at gamescom, where they showed real-time ray tracing capabilities using Battlefield V as an example.

As you can see, the main difference is that fire reflections appeared on other objects. These reflections were caused by a shot from a tank gun. In other words, a new light source was added, and the rays emanating from it were reflected in the glossy body of the car, in the disk of the remaining wheel and in the puddle. And no matter how strange such fire may be even against the backdrop of previous parts of Battlefield , the tracing effects themselves were shown very clearly and spectacularly.

But in order to better understand the scale of innovations that may await us in the future, let’s take a short excursion into history.

How was the technology born?

The idea of ray tracing itself is far from new and has been quite successfully used in the field of modeling, or more precisely, in visualization and rendering.
It all started with the ray casting method, which was created to calculate gamma rays, that is, to study radiation. The first rendering option was presented in 1968 by scientist Arthur Appel. The essence of the method was to generate a beam from the observation point (one beam per pixel) and search for the closest object that blocks its further propagation. Based on this data, computer graphics algorithms could be used to determine the shading of a given object. The term ray casting itself appeared only in 1982.

When creating computer graphics for the 1982 film Tron, it was the method of throwing rays that was used.

The next important stage began in 1979. The fact is that the ray-throwing algorithms traced the path of the ray from the observer only until it collided with the object. Scientist Turner Whitted continued this process. In his algorithm, a ray could create three new types of rays after hitting a surface: reflection, refraction and shadow. Accordingly, one can understand that ray tracing is a more complex series of tasks that not only uses ray casting to determine the intersection point of a ray and an object, but also calculates secondary and tertiary rays that can be applied to data collection. Those, in turn, are needed to calculate reflected or refracted light.

In the early 1980s, at Osaka University, a group of professors and students created LINKS-1, a computer running on 514 microprocessors. The device was designed to create three-dimensional graphics using ray tracing. In 1985, the first planetary hall video modeled entirely on LINKS-1 was presented Fujitsu

This is what the Fujitsu pavilion looked like back then.

In 1984, BRL-CAD, a modeling system created by the US Ballistic Research Laboratory, was demonstrated. Three years later, a tracer (raytracer) was introduced for it, the feature of which was good optimization. Overall rendering performance reached several frames per second, even if it was achieved using several shared memory machines. BRL-CAD itself is now an open source program and is updated occasionally.

Where has tracing been useful?

This is Walkie-Talkie, a skyscraper located on Fenchurch Street in London.

The building reflects sunlight so that things melt on the next street, and people feel free to heat up food on the sidewalk. One of the victims was a parked Jaguar XJ, whose mirrors and emblem melted due to overheating.

But the Walkie-Talkie isn't the only structure that has caused problems due to solar rays. Such buildings include the Walt Disney Concert Hall in Los Angeles and the Vdara Hotel in Las Vegas. This effect is called “death rays”. In 2020 , NVIDIA cited these buildings as examples of a mistake that could have been avoided with its new physically based rendering technology.

It turns out that tracing is already used in 3D modeling, but during this time the hardware has become more powerful and the tasks more complex. Let's talk about the difficulties.

Ray tracing issues

The main problem with ray tracing is performance.
For computer technology, there is nothing difficult in calculating the behavior of one beam. But even if you take a scene in which there is one light source and a small number of objects, within 10, then there will be a huge number of rays. After each change of camera position, all these rays must be recalculated. When it comes to complex modeling of things important for science or making films (where pathtracing is used, but every movement is known in advance), there is time for computers to visualize every second for a long time.

This is what we expect from real-time tracing.

The role of Unity, Microsoft and NVIDIA in what we see today

Now we have come to the point where we should start talking about real-time tracing.
In games, the position of our character is constantly changing, the objects themselves also move. All this makes our already bad productivity situation even worse. In 2008, Intel showed demos of the research project Quake Wars: Ray Traced, based on the content of Enemy Territory: Quake Wars . Performance was rated at 14-29fps with multiple quad-core processors and 20-35fps with six-core processors. The video card was also from Intel , based on the Larrabee architecture, the final products of which never went on sale.

In 2009, NVIDIA announced Optix, a free software package for working with tracing on video cards. Compatible programs include Adobe After Effects, Autodesk Maya, 3ds Max and others.

The recent history of ray tracing in games began with Brigade, a game engine that was able to demonstrate decent real-time ray tracing results. Of course, they were not as beautiful as the picture in Unreal Engine 4 with static lighting, but in Brigade you could change the number and characteristics of light sources and the result was immediately visible. And in UE4, for a full result, rendering was required using the current version of V-Ray.

Of course, such results could not go unnoticed, and Brigade became part of the OctaneRender graphics engine, which was included in the Unity you know. In turn, Unreal Engine adopted the developments of GPUOpen, a software package that offers advanced visual effects.

Microsoft has made an addition to the DirectX 12 API in the form of DXR (DirectX Raytracing). Later, AMD (the creator of GPUOpen) introduced ray tracing support into its Vulkan API.

And already this year, NVIDIA announced and released gaming video cards based on the Turing architecture, which implies the presence of RT cores for working specifically on ray tracing and tensor cores. The latter type of cores was inherited from the previous architecture - Volta, on the basis of which there are only two types of products (Titan V and Quadro GV100), and they are very expensive. Tensor cores are designed to solve deep learning problems faster.

About performance

As we remember, any time the camera moves, all the rays in the scene have to be re-calculated. If you just freeze at one moment, the calculation, let’s say, of this frame will not stop and will endlessly refine what we already see after a couple of minutes of inactivity. Even on weaker cards compared to RTX, after a few seconds you can understand what the final picture will look like, only it will contain a large amount of “noise”.

An example of the Brigade engine in action. The graininess of the picture is noticeable.

And here we remember about Optix, which has been using AI-Accelerated Denoiser since version 5.0. This is a technology designed to use trained neural networks to draw a picture similar to what was created using tracing. In terms of power consumption, this approach is much simpler, but the final result will be worse than that obtained using “honest” tracing.

Denoiser in action.

What do we have today?

In games that don't have ray tracing, the Turing series delivered the typical generation-switch performance boost of around 20% (though not without some surprises).
Of the games with ray tracing, we only have Battlefield V. It is worth noting that when RTX settings are enabled, performance drops significantly. In terms of pictures, it is better to independently compare what happened with what was shown at the presentation.

At the presentation the difference was very noticeable.

In professional software, as expected, the changes produced results for the better. But when choosing, remember that the increase is not the same in all programs: in some places it is up to 20% (maybe higher), and in others it is a tenth of a percent. For example, in OctaneRender, the Spaceships scene was processed 12% faster on the RTX 2080 compared to the GTX 1080 Ti.

More examples

In Atomic Heart, in addition to the softer shadow, you can notice that in the version without RTX, to the right of the robot it looks like someone threw a package of kefir into the wall. With RTX settings enabled, the light from sources in that area is more or less even.

Robot from Atomic Heart.

Full video

In Metro: Exodus, ray tracing could even affect the atmosphere. Personally, I think the new look is too cheerful, but this is only noticeable when comparing “before” and “after”.

Exterior of a house from Metro: Exodus.

Full video

***

Definitely, real-time ray tracing can be an important step in games' path to photorealistic images. But we hope for the soon arrival of realistic shadows, reflections in mirrors and the ability to see the enemy behind you while staring at a polished surface.

The current results are too early to fully say whether it was worth postponing the implementation of real-time tracing for a few more years, until the availability of finished products on the part of video cards, software and games. Much depends on whether AMD and Intel - competition would give more confidence that current developments will not be forgotten with the release of PlayStation 5 and video cards from Intel .

In any case, Unity in its report mentioned the end of 2020 as the stage when real-time ray tracing was just starting to appear in games. According to the company, the technology will become widespread only in 2020.

Write a commentTotal comments: 62

So are you staying or are you going?

After performing the transformations at the projection stage, we move on to what is called clip space. Although this is done after the projection, it's easier to show what happens if we do the operations beforehand:

In the picture above we can see that the rubber duck, one of the bats and part of the trees have triangles inside the frustum;
however, the other bat and the farthest tree are outside the frustum. Although the vertices that make up these objects have already been processed, we will not see them in the viewport. This means they are truncated. With frustum clipping, all primitives outside the frustum pyramid are completely removed, and those lying on the boundaries are converted into new primitives. Truncation doesn't improve performance very much because all those invisible vertices have already been processed before this step in vertex shaders and the like. If necessary, you can even skip the entire truncation step completely, but this feature is not supported by all APIs (for example, standard OpenGL will not allow you to skip it, but this can be done using an API extension).

It's worth noting that the position of the far clipping plane in games is not always equal to the draw distance, because the latter is controlled by the game engine itself. The engine also performs frustum culling - it runs code that determines whether an object will be drawn within the frustum culling and whether it will affect visible objects; if the answer is negative, then the object is not sent for rendering. This is not the same as frustrum clipping, because it also discards primitives outside the pyramid, but they have already gone through the vertex processing stage. When culling, they are not processed at all, which saves quite a lot of resources.

We've done all the transforms and trimming, and it looks like the vertices are finally ready for the next step in the rendering sequence. But in reality this is not the case, because all calculations carried out in the vertex processing stage and in the transformation operations from world space to frustum space must be performed in a homogeneous coordinate system (i.e. each vertex has 4 components, not 3) . However, the viewport is completely 2D, meaning the API expects the vertex information to only contain values for x, y (although the z depth value is preserved).

To get rid of the fourth component, a perspective division is performed, in which each component is divided by the value of w. This operation limits x and y to the range of possible values [-1,1] and z to the range [0,1]. These are called normalized device coordinates (NDC).

If you want to take a closer look at what we just explained and you like math, then read Song Ho Ahn's excellent tutorial on the topic. Now let's turn these vertices into pixels!

Mastering rasterization

As with transformations, we'll look at the rules and processes used to turn a viewport into a grid of pixels, using Direct3D as an example. This table resembles an Excel spreadsheet with rows and columns, in which each cell contains different data values (such as color, depth values, texture coordinates, etc.). Typically this mesh is called a raster image, and the process of generating it is called rasterization. In the article 3D rendering 101, we looked at this procedure in a simplified manner:

The image above gives the impression that the primitives are simply cut into small blocks, but in reality there are many more operations.
The very first step is to determine whether the primitive is facing the camera - for example, in the frustum image shown above, the primitives that make up the back of the gray rabbit will not be visible. So although they are present in the viewport, they do not need to be rendered. We can get a rough idea of what this looks like by looking at the diagram below. The cube went through various transformations to place the 3D model in the 2D space of the screen, and from the camera's point of view, some of the faces of the cube are not visible. If we assume that all surfaces are opaque, then some of these primitives can be ignored.

From left to right: world space > camera space > projection space > screen space
In Direct3D, this can be done by telling the system what the render state will be, and this instruction will tell it to remove (cut off) the forward or backward facing sides of each primitive ( or not cut off at all, for example, in wireframe mode). But how does it know which sides are facing forward or backward? When we looked at the mathematics of vertex processing, we saw that triangles (or rather vertices) have normal vectors that tell the system which way it is facing. Thanks to this information, a simple check can be performed, and if the primitive fails, it is removed from the rendering chain.

Now it's time to apply the pixel grid. This is again a surprisingly complex process because the system must figure out whether the pixel is completely inside the primitive, partially inside it, or not at all. To do this, a coverage testing process is performed. The image below shows how triangles are rasterized in Direct3D 11:

The rule is quite simple: a pixel is considered to be inside a triangle if the center of the pixel passes a test that Microsoft calls the “top left” rule. "Top" refers to checking the horizontal line; the center of the pixel must be on this line. "Left" refers to non-horizontal lines, and the center of the pixel should be to the left of such a line. There are other rules that apply to non-primitives, such as simple lines and points, and when using multisampling, additional if conditions appear in the rules.

If you look closely at the Microsoft documentation, you can see that the shapes created by the pixels are not very similar to the original primitives. This happens because the pixels are too large to create a realistic triangle - the raster image does not contain enough data about the original objects, which causes a phenomenon called aliasing.

Let's look at aliasing using the example of UL Benchmark 3DMark03:

720 x 480 pixel rasterization
In the first image, the raster image has a very low resolution of 720 x 480 pixels. The aliasing is clearly visible on the railings and the shadow cast by the top soldier's weapon. Compare this with the result obtained when rasterizing with 24 times the number of pixels:

Rasterization 3840 x 2160 pixels
Here we can see that the aliasing on the railings and shadows has completely disappeared. It looks like you should always use a large bitmap, but the grid dimensions should be supported by the monitor on which the frame will be displayed. And given that all these pixels need to be processed, there will obviously be a performance hit.

This is where multisampling can help. Here's how it works in Direct3D:

Instead of checking that the center of a pixel meets the rasterization rules, several points within each pixel (called subpixel samples or subsamples) are checked, and if any of them meet the requirements, then they form part of the shape. It may seem that there is no benefit here and the aliasing is even increased, but when using multisampling, information about which subsamples are covered by the primitive and the results of pixel processing are stored in a buffer in memory.

This buffer is then used to blend the subsample data and pixels so that the edges of the primitive are less jagged. We'll look at aliasing in more detail in another article, but for now this information is enough to understand what multisampling can do when used to rasterize too few pixels:

As you can see, the amount of aliasing at the edges of different shapes has decreased significantly. Higher resolution rasterization is definitely better, but the performance penalty may push you to use multisampling.

Also during the rasterization process, occlusion testing is performed. This is necessary because the viewport will be filled with overlapping primitives - for example, in the picture above, the forward-facing triangles that make up the soldier standing in the foreground overlap the same triangles of another soldier. In addition to checking whether the primitive covers the pixel, relative depths can also be compared, and if one surface is behind the other, then it should be removed from the rest of the rendering process.

However, if the near primitive is transparent, then the far one will remain visible, although it will not pass the overlap check. This is why almost all 3D engines perform occlusion checks before sending data to the GPU and instead create something called a z-buffer as part of the rendering process. Here the frame is created in the usual way, but instead of storing the finished pixel colors in memory, the GPU only stores the depth values. These can later be used in shaders to check visibility and with greater control and precision regarding aspects regarding object occlusion.

In the image shown above, the darker the pixel color, the closer the object is to the camera. The frame is rendered once to create the z-buffer, and then rendered again, but this time while the pixels are being processed, a shader is run that checks them against the values in the z-buffer. If it is invisible, then the pixel color is not written to the finished frame buffer.

For now, our main last step will be interpolating the vertex attributes - in the original simplified diagram the primitive was a complete triangle, but do not forget that the viewport is filled only with the corners of the shapes, and not with the shapes themselves. That is, the system must determine what color, depth, and texture of the primitive should be between the vertices, and this operation is called interpolation. As you might have guessed, this is another calculation, and it's not that simple.

Although the rasterized screen is in 2D, the structures within it represent a 3D perspective. If lines were truly two-dimensional, then we could use a simple linear equation to calculate colors and stuff, because we go from one vertex to the next. But due to the 3D aspect of the scene, interpolation must take this perspective into account; To learn more about this process, read Simon Yoon's excellent article.

So, the task is completed - this is how the 3D world of vertices turns into a 2D grid of multi-colored blocks. But we're not quite done yet.

What does Nvidia RTX mean for ray tracing, GPU rendering and Vray?

Following the big release of new graphics cards, Vlado explains what this breakthrough means for the future of rendering.

Over the past almost 20 years of research and development, we have been able to create the most technologically advanced photorealistic ray trace renderer Vray. Ray tracing is the best method for achieving true photorealism, as it is based on the physical principles of light behavior. For this reason, the Academy of Motion Picture Arts and Sciences recognized our contributions to ray trace rendering by awarding us the Sci-Tech Award for its widespread use in the visual effects industry. We've always strived to make ray tracing faster, and ten years ago we started to harness the power of graphics cards. We are now looking to the future of using hardware solutions built specifically for ray tracing calculations. This means that we can now implement real-time ray trace rendering.

Nvidia's announcement of the Turing architecture in the new line of RTX video cards is an important milestone in the history of computer graphics and specifically race tracing. Professional Quatro RTX was introduced at SIGGRAPH 2020, and consumer GeForce RTX solutions were presented at Gamescom 2020. These new graphics cards include a new RT Core unit dedicated exclusively to ray tracing to dramatically speed up these tasks, and also bring a new interface to the consumer market NVLink, which allows you to double the available memory when using two video cards. With the announcement of the full line, it makes sense to spend a few minutes understanding what this means for the future of rendering

RT Cores in RTX Carats Before we understand what these modules provide, let's quickly understand the basics of ray tracing. The process of ray tracing in a scene can be briefly divided into two different parts - tracing and shading.

Raycasting This is the process of finding the intersection of ray trajectories and objects in the scene. Objects consist of various geometric primitives - triangles, curves (for hair), particles, etc. In a typical scene there may be hundreds of copies of objects and hundreds of millions of unique geometric primitives. Finding the intersection of a ray with these primitives is a complex operation that involves complex data structures such as bounding volume hierarchies (BVH) that help reduce the amount of computation required. Shading Shading is the process of determining the external video of an object, including calculating textures and material properties, that is, how the object reacts to light. Shading also affects which rays need to be traced to determine the appearance of an object - for example, calculating shadows from light sources, reflections, GI, and the like. The shader tree can be quite complex, including calculations of procedural maps and their combinations in different shader parameters, such as reflections, diffuses, normals. This also includes lighting calculations.

Depending on the amount of geometry in the scene and the complexity of the shading, the ratio between tracing rendering and shading rendering can vary greatly, with rays sometimes accounting for 80% of the time in the case of very simple scenes, while only 20% in complex scenes. New RTX cards contain specialized RT cores to speed up raycasting. Since this is a rather complex algorithm, its implementation at the hardware level can lead to significant acceleration of calculations. However, even if raycasting is infinitely fast and takes no time at all, the speed increase from RT cores in different scenes will vary depending on the time required for it. In general, scenes with simple shaders and a lot of geometry will benefit significantly more than scenes with simple geometry and complex shaders.

To illustrate the above, we rendered the same scenes with a regular Vray GPU and an experimental version with RTX support. We rendered the scene with the usual gray material, and then with the original shaders. The scene has 95,668,638,333 triangles, rendered with 512 samples per pixel

In the gray material scene, 76% of the calculation time was spent on raycasting

In the same scene with full-fledged materials, 59% of the time was already spent on it

While we are not quite ready to publish the results of work on the yet-to-be-released hardware from Nvidia, we can tell you what effect the new architecture will have. The scene above was calculated on a pre-release version of the Turing architecture with beta drivers and on an experimental version of V-Ray GPU, where we could control the number of rays involved in the calculation. With simpler shading, most of the render time is spent on raycasting, in which case we should see greater acceleration on the RT cores. We are going to modify the V-Ray GPU in such a way as to maximize performance on the new hardware. It is also worth mentioning that the Turing architecture itself is significantly faster than the previous Pacal, even when running the V-Ray GPU without modifications.

It's important to note that applications must be significantly modified to take advantage of RT Cores, meaning that existing raytrace solutions will not automatically receive the speedup. Their kernels were created using three APIs - NVIDIA OptiX, Microsoft DirectX (via the DXR extension), and Vulkan. The last two are intended for use in real-time tasks in game engines, while OptiX is better suited for production and offline rendering.

We at Chaos Group have been working with Nvidia for about a year to find ways to use RT Cores in our products. V-Ray GPU is an obvious application of the new technology and we already have experimental builds, however, optimizing the code to fully support all features will take time. For now, we note that all current releases work great with new cards, although they cannot use RT acceleration. With the addition of their support, new versions of V-Ray will continue to support previous generations of cards as before.

In the video below we show a version of the V-Ray GPU modified to use RT cores. Our goal was not to show performance - we will publish benchmarks in a separate blog post after the official release of the hardware.

We also studied the capabilities of RT cores in the context of real-time ray tracing on our Project Lavina project in order to understand the capabilities of hardware. We were also interested in the possibility of completely replacing rasterization with ray tracing in such cases. DXR was the first API for realtime calculations using a new hardware module, so Project Lavina is based on it. We are also considering Vulkan to support Linux systems in the future. Initial results are very promising and we continue to develop and improve this technology. Obviously, this is the very beginning - we are currently working on exploring the possibilities of real-time tracing and expect rapid progress in this direction in the coming months, which will give our users new opportunities to work with scenes in real time without the laborious process of converting them for game engines.

As always, our solutions are based solely on ray tracing - unlike game engines, which rely only partly on this technology. However, RT Cores are only part of the story. RTX cards also support NVLink, which allows total card memory available for rendering with minimal impact on performance.

NVLink NVLink is a port that allows you to connect two or more video processors and allow them to exchange information at very high speeds. This means that one GPU can access the memory of another. Consequently, programs such as V-Ray can place scenes in the shared memory of video cards that were too large for one of them. Typically, during rendering, the scene is duplicated in the memory of all cards involved in the calculations, but NVLink allows you to combine their memory. For example, two accelerators with 11GB on board each will together have 22GB. NVLink was introduced in 2020 and V-Ray was the first renderer to officially support it in version 3.6 and later. Until now, NVLink remained the preserve of Quatro and Tesla, but now it has entered the consumer market.

Conclusion There have been attempts to create specialized hardware for ray tracing in the past, but they have been a big failure - partly due to the fact that shading and raycasting are usually closely related and trying to calculate them on different hardware did not give the desired efficiency. The ability to calculate both types of algorithms on one device makes Nvidia RTX an interesting architecture. We expect this series of cards to have a major impact on the industry in the future and firmly establish GPU ray tracing as a technology for online and offline rendering. We at Chaos Group are working hard to enable our users to take advantage of the new hardware.

Vlado Koyalazov, CTO and founder of Chaos Group.

Original

Translation - Andrey Orlov, admin of the Motion Picture public

Front to back (with some exceptions)

Before we finish looking at rasterization, we need to talk about the order of the rendering sequence.
We're not talking about the stage where tessellation appears in the processing sequence, for example; we mean the order in which the primitives are processed. Objects are typically processed in the order in which they are found in the index buffer (the block of memory that tells the system how vertices are grouped together) and this can significantly influence the way transparent objects and effects are processed. The reason for this comes down to the fact that primitives are rendered one at a time, and if the ones in front are rendered first, then all those behind them will be invisible (this is where occlusion culling comes into play) and can be thrown out of the process (helping preserve performance). This is usually called front-to-back rendering, and for this process the index buffer must be ordered this way.

However, if some of these primitives right in front of the camera are transparent, then rendering from front to back will cause objects behind the transparent to be lost. One solution is to render from back to front, where transparent primitives and effects are calculated last.

From left to right: order in the scene, rendering from front to back, rendering from back to front
So, in all modern games, rendering is done from back to front? Regardless, don’t forget that rendering each individual primitive will lead to a much greater performance penalty compared to rendering only what we see. There are other ways to handle transparent objects, but in general there is no ideal solution that fits every system, and each situation must be considered separately.

Essentially, this gives us an idea of the main pros and cons of rasterization - on modern hardware it is a fast and efficient process, but it is still an approximation of what we see. In the real world, every object can absorb, reflect, and sometimes refract light, all of which affect the final appearance of the scene displayed. By dividing the world into primitives and rendering only part of them, we get fast. but a very approximate result.

If only there was another way...

There is another way: ray tracing!

Nearly fifty years ago, a computer scientist named Arthur Appel was working on a system for rendering images on a computer in which a single beam of light was emitted from a camera in a straight line before striking an object.
After the collision, the properties of the material (its color, reflectivity, etc.) changed the brightness of the light beam. For each pixel in the rendered image, one ray was emitted, and the algorithm performed a chain of calculations to determine the color of the pixel. Eppel's process is called ray casting. About ten years later, another scientist named John Whitted developed a mathematical algorithm that implemented the Eppel process, but when a beam collided with an object, it generated additional beams diverging in different directions depending on the material of the object. Since this system generated new rays whenever it interacted with objects, the algorithm was recursive in nature and much more computationally complex; however, it had a significant advantage over Eppel's technique in that it could properly account for reflections, refractions, and shadows. This procedure was called ray tracing (strictly speaking, it is inverse ray tracing because we follow the ray from the camera, not from objects) and has since become the holy grail of computer graphics and films.

From the image shown above, you can understand how Whitted's algorithm works. For each pixel in the frame, one ray is emitted from the camera and travels until it reaches the surface. In this example, the surface is translucent, so light can be reflected and refracted through it. In both cases, secondary rays are generated and travel until they hit the surface. New secondary rays are also generated to take into account the color of light sources and the shadows they create.

The recursive nature of the process means that secondary rays can be generated each time a new emitted ray intersects a surface. This can quickly get out of control, so the number of secondary beams generated is always limited. Once the ray path is completed, the color at each endpoint is calculated based on the material properties of that surface. This value is then passed along the ray to the previous one, changing the color for that surface, and so on, until we reach the starting point of the primary ray, namely a pixel in the frame.

Such a system can be extremely complex and even simple scenes can generate a large amount of computation. Luckily, there are tricks to make things easier - first, you can use hardware specifically designed to speed up these math operations, similar to how matrix math works for vertex processing (more on that in a moment). Another key trick is to try to speed up the process of identifying the object the beam hits and exactly where they intersect - if the object is made up of many triangles, this can be surprisingly difficult:

Source: Real-Time Ray Tracing with Nvidia RTX
Instead of checking each individual triangle in each object before ray tracing, it generates a list of bounding volumes (BVs) - these are ordinary parallelepipeds that describe the object. For various structures within the object, smaller bounding volumes are cyclically created.

For example, the first BV would be the entire rabbit. The next pair will describe his head, legs, body, tail, etc.; each of the volumes will in turn be another collection of volumes for smaller structures of the head, body, etc., and the last level of volumes will contain a small number of triangles for testing. All these volumes are often arranged in an ordered list (called BV hierarchy or BVH); Thanks to this, the system checks a relatively small number of BVs each time:

While using BVH does not, strictly speaking, speed up ray tracing itself, generating the hierarchy and the required subsequent search algorithm is generally much faster than checking whether a single ray intersects one of the millions of triangles in the 3D world.

Today, programs such as Blender and POV-ray use ray tracing with additional algorithms (such as photon tracing and radiosity) to generate highly realistic images:

The obvious question might be: if ray tracing is so good, why isn't it being used everywhere? The answer lies in two areas: first, even simple ray tracing creates millions of rays that need to be calculated over and over again. The system starts with just one beam per screen pixel, meaning at 800 x 600 resolution it generates 480,000 primary beams, and then each of them generates many secondary beams. This is a very difficult job even for modern desktop PCs. The second problem is that simple ray tracing is not particularly realistic and requires a whole bunch of additional very complex equations to implement it correctly.

Even on modern hardware, the amount of work involved in 3D games is unattainable for real-time implementation. We saw in 3D rendering 101 that the ray tracing benchmark takes tens of seconds to create a single low-resolution image.

How did the first Wolfenstein 3D do ray casting back in 1992, and why do games like Battlefield V and Metro Exodus released in 2020 offer ray tracing capabilities? Do they do rasterization or ray tracing? A little bit of both.

What is Ray-Traycing?

As in the school physics curriculum, before studying a new topic, let's start with the definition of the term. Ray tracing is a rendering technique that uses the principles of real physical processes. In order to build a three-dimensional model of an object and apply ray tracing to it, the system tracks the trajectory of a virtual ray to this object. At the same time, the system needs to take into account the surface of the object and the properties of its material. Finally, the light is tracked using multiple beams that simulate reflected light. This is how ray tracing occurs, which takes into account refractions, reflections of rays, as well as the correct interaction of light with any surfaces, including mirrored ones. By the way, after reflecting light from an object, the light may have changed its color - and the system also needs to take this into account.

In theory, ray tracing is a simple process, the roots of which come from physics, and tracing itself is far from new. But everything is simple only in theory. In practice, ray tracing is incredibly labor-intensive from a technical point of view, because some rays may not be reflected at all, some may be reflected only a couple of times, and some rays within the same scene can be reflected an infinite number of times. And for everything to be reflected correctly, the system needs to calculate absolutely every ray. Completely accurate and correct ray tracing requires very high computing power, but even in this case, it is a very lengthy process.

By the way, filmmakers have been using this technology in film production for a long time. You may have seen ray tracing in cinema, for example, in the 1982 film Tron. Typically, ray tracing in films is added during the editing stage, so filmmakers don't have to calculate the behavior of light sources in real time. They only need to do this once when rendering the tape. But even in this case, rendering rays on one frame can take many hours. But in games, developers will never be able to predict in advance where the player will go and from which side he will look at the object in order to calculate the reflections and refractions of rays once, as they do in the movies. So games are all about real-time ray tracing, which is an incredibly time-consuming process. It is precisely because of the complexity of ray tracing that its “arrival” in the gaming industry was greatly delayed.

A hybrid approach for today and the future

In March 2020, Microsoft announced the release of a new API extension for Direct3D 12 called DXR (DirectX Raytracing). This was a new graphics pipeline that complemented the standard rasterization and compute pipelines. Additional functionality was provided by adding shaders, data structures, and so on, but did not require hardware support beyond what was already needed for Direct3D 12.

At the same Game Developers Conference where Microsoft talked about DXR, Electronic Arts talked about its Pica Pica Project, an experiment with a 3D engine using DXR.
The company showed that ray tracing can be used, but not for rendering the entire frame. The bulk of the work uses traditional rasterization and compute shader techniques, with DXR being used in specific areas. That is, the number of rays generated is much less than it would be for the whole scene. This hybrid approach has been used in the past, although to a lesser extent. For example, Wolfenstein 3D used ray casting to render a frame, but it was done with one ray per column of pixels, rather than per pixel. This may still seem impressive, unless you remember that the game was running at a resolution of 640 x 480. lane: actually 320 x 200], that is, no more than 640 rays were emitted at the same time. Early 2020 graphics cards like the AMD Radeon RX 580 or Nvidia GeForce 1080 Ti met DXR requirements, but even with their computing capabilities, there were concerns that they wouldn't be powerful enough to make DXR worthwhile.

That changed in August 2020 when Nvidia released its latest GPU architecture, codenamed Turing. The most important feature of this chip was the appearance of so-called RT Cores: separate logical blocks to speed up calculations of ray-triangle intersection and traversal of the bounding volume hierarchy (BVH). These two processes are time-consuming procedures for determining where light interacts with the triangles that make up the objects in the scene. Given that RT Cores were unique blocks of the Turing processor, they could only be accessed through Nvidia's proprietary API.

The first game to support this feature was EA's Battlefield V. When we tested DXR in it, we were impressed by the improvement in reflections on water, grass and metals, as well as the corresponding decrease in performance:

To be honest, subsequent patches improved the situation, but the reduction in frame rendering speed was still present (and still is). By 2019, there were some other games that supported this API and performed ray tracing on individual parts of the frame. We tested Metro Exodus and Shadow of the Tomb Raider, and encountered the same situation - with active use, DXR noticeably reduces the frame rate.

Around the same time, UL Benchmarks announced the creation of a DXR feature test for 3DMark:

DXR is used in the Nvidia Titan X (Pascal) graphics card - yes, the result is 8 fps
However, a study of DXR-enabled games and the 3DMark benchmark showed that ray tracing, even in 2020, is still a very difficult task for the GPU, even by price of more than 1000 dollars. Does this mean we have no real alternatives to rasterization?

Progressive features in consumer 3D graphics technologies are often very expensive, and their initial support for new API capabilities can be quite fragmented or slow (as we found out when testing Max Payne 3 on different versions of Direct3D in 2012). The latter problem usually arises because game developers try to include as many modern features as possible in their products, sometimes without having enough experience to do so.

However, vertex and pixel shaders, tessellation, HDR rendering and screen space ambient occlusion were also once expensive techniques suitable only for powerful GPUs, and now they are standard for games and are supported by many graphics cards. The same will happen with ray tracing; Over time, it will simply become another detail setting that is turned on by default for most players.

The main problems of ray tracing in games

«Ok, ray tracing is cool! But is everything really that smooth?

“- you will say reasonably. Yes, indeed, ray tracing is an important component of graphics in games. It takes picture quality to a whole new level, adding a cinematic feel to create rich visual effects that no one could have dreamed of before. But still, ray tracing has several global problems:

Very large “load” on performance.
Not many games support RTX ray tracing

Raytracing, as we mentioned earlier, requires a lot of computing power. GeForce RTX video cards optimized specifically for this task

they don't always cope with tracing in games, especially at high resolutions.
Of course, NVIDIA
, together with game developers, regularly releases driver updates that improve ray tracing performance, but sometimes even the most powerful video card in the line,
RTX 2080 Ti

RTX
enabled , in 4K resolution and at maximum graphics settings.

“But where there was a table of food, there is a coffin” - if the first problem is solved by hard work on optimizing games, then the second problem is much more global - today there are not many games with full implementation of ray tracing. We specifically emphasized the word “full” because Today there are only a couple of games where ray tracing is presented to the maximum. The rest were a little less fortunate. For example, in Battlefield V

Only reflections and refraction effects are implemented,
Shadow of the Tomb Raider

RTX
technology , and
Metro: Exodus
has global illumination and shading.

The very games where ray tracing is presented fully and in all its glory - Control

and
Quake II RTX
, which the developers released together with
NVIDIA
.
And if with Control
everything is more or less clear, because it is a relatively recent novelty, then with
Quake II RTX
the situation is more interesting. In a significantly improved re-release of the legendary classic, not only the textures have been updated and all models have been redrawn (with the help of user modifications), on the tracing side there are also realistic reflections, refractions, and shadows with global illumination. Those. a complete set, which should be the default in all games that support ray tracing. Unfortunately, now the number of games equipped with ray tracing can be counted on the fingers of one hand, of course, if you do not take into account tech demos created solely to demonstrate the technology to the public.