Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 4: Lan Lou #5

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
80 changes: 72 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,83 @@
CUDA Rasterizer
===============

[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md)
### Sample Rasterization

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
#### ```resolution```: 900X900 ```GLTF model```: cesiummilktruck ```shader``` : blinn_phong perspective corrected bilinear textureed
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/truck.gif)

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
### Introduction

### (TODO: Your README)
Rasterization is an efficient rendering technique commonly used in computer graphics and especially in games,simmilar to path and ray tracing, it basically does one thing: transforming the 3d object into 2d screen.
Different from raytracing or pathtracing, however, in rasterization, we will not track rays' further interaction with geometry anymore, instead, we will only cast the rays from each screen pixels into the scene, and get the color, depth, specular, etc results, and use these to simulate the scene, so as a consequence, rasterization is much more efficient, but is harder to get to an realistic result.

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
### Features:

- Basic features:
- Vertex shading
- Primitive assembly
- Rasterization
- Fragment shading
- A depth buffer for storing and depth testing fragments
- Fragment-to-depth-buffer writing (with atomics for race avoidance)
- simple lighting scheme including togglable Lambert and Blinn-Phong
- Extra:
- UV texture mapping with bilinear texture filtering and perspective correct texture coordinates
- Support for rasterizing additional primitives with toggle, including line, points
- correct color interpolation on a primitive
- * tried SSAO, but result is not accurate

## Debug view:

albedo buffer|depth buffer|
------------|--------
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/diffuse.gif) | ![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/depth.gif)

normal buffer|specular buffer|
------------|--------
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/normal.gif) | ![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/spec.gif)

#### combined:
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/res.gif)

## Support for other primitives:

point|line|triangle
-----|----|-----
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/p.gif) | ![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/line.gif) | ![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/lamm.gif)




## perspective correct texture coordinates

in the following to comparisions, the right image both shows what we will get when we use simple linear interpolation to aquire stuffs like normal, albedo, and depth, the result is apparently wrong and looks wierd, the reason for this is that when we are doing interpolation, we are only using the barycentric values in triangle vertices and the triangle value we want to interpolate, we haven't taken depth(z) information into consideration, which is really important for correctly transforming 3d data into 2d screen (depth information can't be lost), so what we should do instead is to use both the baryalue and z value to compute our result.

corrected duck|not-corrected duck
-----|----
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/yes.gif) | ![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/non.gif)

corrected checkboard|not-corrected checkboard
-----|----
![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/perspcorrect.JPG) | ![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/notcorrected.JPG)

## correct color interpolation between points on a primitive

![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/colorinterp.gif)

# Performance analysis

## break down of pipeline time consumption:

![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/imgg.JPG)

as can be seen from the above graph, for each model, vertex transfrom almost cost the same amount of time,as it is bascically a parallel data copying process, for the same reason, primitive assembly time is the same in spite of different models, apparently, most time is used to do the rasterization operation, because we will have a lot of iterative checks for each thread, moreover, I put the bilinear filtering inside the triangle rasterization kernal, so it might bring the time consumption even higher, finally, rendering is also the same for all the models, this is simply because the shader are just too simple.....

## with and without perspective correction:

![](https://github.com/LanLou123/Project4-CUDA-Rasterizer/raw/master/renders/pcomp.JPG)

the above image is tested using duck with and without perspective correction, it shows that, with perspective correction, we have some decrease in rasterization efficiency, this might because we have to do extra computation with z values in order to interpolate stuff.

### Credits

Expand Down
Binary file added renders/colorinterp.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/depth.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/diffuse.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/imgg.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/lambert.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/lamm.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/line.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/non.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/normal.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/notcorrected.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/p.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/pcomp.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/perspcorrect.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/res.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/spec.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/truck.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/yes.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@ set(SOURCE_FILES
"rasterize.cu"
"rasterize.h"
"rasterizeTools.h"
"common.h"
"common.cu"
)

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_60
)
51 changes: 51 additions & 0 deletions src/common.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#include "common.h"

void checkCUDAErrorFn(const char *msg, const char *file, int line) {
cudaError_t err = cudaGetLastError();
if (cudaSuccess == err) {
return;
}

fprintf(stderr, "CUDA error");
if (file) {
fprintf(stderr, " (%s:%d)", file, line);
}
fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err));
exit(EXIT_FAILURE);
}


namespace StreamCompaction {
namespace Common {

/**
* Maps an array to an array of 0s and 1s for stream compaction. Elements
* which map to 0 will be removed, and elements which map to 1 will be kept.
*/
__global__ void kernMapToBoolean(int n, int *bools, const int *idata) {
// TODO
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx < n)
{
if (idata[idx])
bools[idx] = 1;
}
}

/**
* Performs scatter on an array. That is, for each element in idata,
* if bools[idx] == 1, it copies idata[idx] to odata[indices[idx]].
*/
__global__ void kernScatter(int n, int *odata,
const int *idata, const int *bools, const int *indices) {
// TODO
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx < n)
{
if (bools[idx])
odata[indices[idx]] = idata[idx];
}
}

}
}
132 changes: 132 additions & 0 deletions src/common.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
#pragma once

#include <cuda.h>
#include <cuda_runtime.h>

#include <cstdio>
#include <cstring>
#include <cmath>
#include <algorithm>
#include <chrono>
#include <stdexcept>

#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)

/**
* Check for CUDA errors; print and exit if there was a problem.
*/
void checkCUDAErrorFn(const char *msg, const char *file = NULL, int line = -1);

inline int ilog2(int x) {
int lg = 0;
while (x >>= 1) {
++lg;
}
return lg;
}

inline int ilog2ceil(int x) {
return x == 1 ? 0 : ilog2(x - 1) + 1;
}

namespace StreamCompaction {
namespace Common {
__global__ void kernMapToBoolean(int n, int *bools, const int *idata);

__global__ void kernScatter(int n, int *odata,
const int *idata, const int *bools, const int *indices);

/**
* This class is used for timing the performance
* Uncopyable and unmovable
*
* Adapted from WindyDarian(https://github.com/WindyDarian)
*/
class PerformanceTimer
{
public:
PerformanceTimer()
{
cudaEventCreate(&event_start);
cudaEventCreate(&event_end);
}

~PerformanceTimer()
{
cudaEventDestroy(event_start);
cudaEventDestroy(event_end);
}

void startCpuTimer()
{
if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); }
cpu_timer_started = true;

time_start_cpu = std::chrono::high_resolution_clock::now();
}

void endCpuTimer()
{
time_end_cpu = std::chrono::high_resolution_clock::now();

if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); }

std::chrono::duration<double, std::milli> duro = time_end_cpu - time_start_cpu;
prev_elapsed_time_cpu_milliseconds =
static_cast<decltype(prev_elapsed_time_cpu_milliseconds)>(duro.count());

cpu_timer_started = false;
}

void startGpuTimer()
{
if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); }
gpu_timer_started = true;

cudaEventRecord(event_start);
}

void endGpuTimer()
{
cudaEventRecord(event_end);
cudaEventSynchronize(event_end);

if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); }

cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end);
gpu_timer_started = false;
}

float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015
{
return prev_elapsed_time_cpu_milliseconds;
}

float getGpuElapsedTimeForPreviousOperation() //noexcept
{
return prev_elapsed_time_gpu_milliseconds;
}

// remove copy and move functions
PerformanceTimer(const PerformanceTimer&) = delete;
PerformanceTimer(PerformanceTimer&&) = delete;
PerformanceTimer& operator=(const PerformanceTimer&) = delete;
PerformanceTimer& operator=(PerformanceTimer&&) = delete;

private:
cudaEvent_t event_start = nullptr;
cudaEvent_t event_end = nullptr;

using time_point_t = std::chrono::high_resolution_clock::time_point;
time_point_t time_start_cpu;
time_point_t time_end_cpu;

bool cpu_timer_started = false;
bool gpu_timer_started = false;

float prev_elapsed_time_cpu_milliseconds = 0.f;
float prev_elapsed_time_gpu_milliseconds = 0.f;
};
}
}
13 changes: 8 additions & 5 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
//-------------------------------
//-------------MAIN--------------
//-------------------------------

int starttime;
int main(int argc, char **argv) {
if (argc != 2) {
cout << "Usage: [gltf file]. Press Enter to exit" << endl;
Expand Down Expand Up @@ -57,6 +57,7 @@ int main(int argc, char **argv) {
// Launch CUDA/GL
if (init(scene)) {
// GLFW main loop
starttime = GetTickCount();
mainLoop();
}

Expand Down Expand Up @@ -97,16 +98,18 @@ void mainLoop() {
//---------RUNTIME STUFF---------
//-------------------------------
float scale = 1.0f;
float x_trans = 0.0f, y_trans = 0.0f, z_trans = -10.0f;
float x_trans = 0.0f, y_trans = 0.0f, z_trans = -10.f;
float x_angle = 0.0f, y_angle = 0.0f;
void runCuda() {
// Map OpenGL buffer object for writing from CUDA on a single GPU
// No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer
dptr = NULL;
int timert = GetTickCount() - starttime;

//y_angle = 0.001*timert;
glm::mat4 P = glm::frustum<float>(-scale * ((float)width) / ((float)height),
scale * ((float)width / (float)height),
-scale, scale, 1.0, 1000.0);
-scale, scale, 3, 1000.0);

glm::mat4 V = glm::mat4(1.0f);

Expand Down Expand Up @@ -382,13 +385,13 @@ void mouseMotionCallback(GLFWwindow* window, double xpos, double ypos)
if (mouseState == ROTATE)
{
//rotate
x_angle += (float)s_r * diffy;
x_angle += -(float)s_r * diffy;
y_angle += (float)s_r * diffx;
}
else if (mouseState == TRANSLATE)
{
//translate
x_trans += (float)(s_t * diffx);
x_trans += (float)(-s_t * diffx);
y_trans += (float)(-s_t * diffy);
}
}
Expand Down
Loading