0
I Use This!
Very Low Activity

Commits : Listings

Analyzed about 12 hours ago. based on code collected about 18 hours ago.
Apr 23, 2023 — Apr 23, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
* updated copyright years
a2flo
as Florian Ziesche
More... 23 days ago
* occ: added CMake support
a2flo
as Florian Ziesche
More... 26 days ago
* hlbvh: removed debug output
a2flo
as Florian Ziesche
More... 27 days ago
* hlbvh: more improvements and fixes: * collider: moved all buffer zero init to the start so we only need to sync once * collider: fixed sync in/after "build_aabbs" * collider: get rid of "colliding_triangles" ping/pong buffers, we run this synchronously with rendering, so one buffer is enough * collider: fixed use of temporary variables in radix sort parameters * collider: get rid of some unused kernel parameters * gl_renderer: this now actually works again (+made the rendering identical to the unified renderer) * unified_renderer/gl_renderer: instead of just red triangles (where collisions are detected), draw these triangles with a yellow-to-red gradient (since the collision is stored per-vertex, this gives a better impression of where the collision happened) * unified_renderer: more cleanup on exit * collider/unified_renderer/gl_renderer: will no longer return and pass the detected collisions, we don't actually need/want this -> we're accessing the "colliding_vertices" buffer anyways and rendering should/must only depend on the "triangle_vis" flag * Xcode: link against the release libfloor in release mode (not debug) * general reformatting / minor cleanup
a2flo
as Florian Ziesche
More... 27 days ago
* hlbvh/img: proper resource cleanup on exit * warp: use libwarp_destroy() instead of libwarp_cleanup() on exit * occ: added PTX 8.4 support
a2flo
as Florian Ziesche
More... about 1 month ago
* hlbvh: no ranges :(
a2flo
as Florian Ziesche
More... about 1 month ago
* hlbvh: improvements, fixes and general modernization: * replaced the old metal_renderer with a unified_renderer that works with both Metal and Vulkan * unified_renderer/misc: the renderer can be used together with a different compute backend -> the render and compute contexts may differ (e.g. Metal/Vulkan rendering, CUDA/Host-Compute hlbvh computations) * obj_loader: can now set additional COMPUTE_MEMORY_FLAGs when loading an .obj file + OpenGL sharing flag now also checks for Metal/Vulkan sharing flags (won't use OpenGL if any is specified) -> when loading from a compute context, loaded model data can now also be used in a render context that is different when sharing flags are set * set buffer/image debug labels everywhere * animation: ensure triangle count is < 65536 (need to guarantee this now to be able to use 16-bit indices) * animation: make use of new sharing sync functionality/flags for the "colliding_vertices" buffer (written on the compute side, read on the render side) * hlbvh: collide_bvhs() now puts the traversal stack into local memory instead of function scope (register) memory and uses 16-bit instead of 32-bit indices -> will now actually work in Vulkan where we can't use dynamic pointers into function scope arrays (no OpPtrAccessChain on Function storage class) - this may be faster on other backends now as well due to less register pressure * hlbvh: the required local size in collide_bvhs() is now computed via a constexpr function to a) demo that functionality and b) actually make use of that capability to make a more complex computation for it -> this is based on the available local memory size now, which should be known and constant (we use 64 * 2 == 128 bytes per work-item -> on CUDA devices this will compute a local size of 384 due to 48KiB of available local memory, on an Apple GPU this is likely to be 256 work-items due to 32KiB of available local memory) * hlbvh: removed the < sm_50 morton code implementation, the bit op variant should be the fastest on all modern devices * hlbvh: use the specific add reduction/scan algorithms instead of using the non-specific ones with "plus<> {}" * collider: stop flushing the logger in debug mode (this costs a lot of time) * collider: everything is now properly synchronized (either sync/blocking execution or explicit queue finish()) * collider: implemented radix sort using an indirect compute pipeline (used if available, otherwise falls back to the direct approach) -> faster * hlbvh_shaders: removed unnecessary "repl_color" + uniforms data is now placed in an actual buffer * enabled non-blocking execution on Vulkan + disabled resource tracking on Metal (now that everything is properly sync'ed) * added --no-unified option to disable the unified renderer * the current frame time is now set as the window caption * added CMake support * updated copyright year * misc cleanup
a2flo
as Florian Ziesche
More... about 1 month ago
* minor updates
a2flo
as Florian Ziesche
More... about 1 month ago
* img: improvements, fixes and general modernization: * replaced the old single-stage blur implementation with a better approach: this now runs with either a 32x32px, 16x16px or 8x8px local size (caching that size + tap count specific overlap, e.g. 46x46px, 30x30px or 22x22px in the default config) and no longer a tap count specific lcoal size -> this is a) a lot faster, and b) actually runs on backends that have limitations on the total local size (e.g. must be a multiple of 32 or must be a power-of-two) * added support for running in float32 mode (default) or float16 mode (via --half startup parameter) * the previous "second cache" is now always active * profiling/timing now uses the compute_queue profiling functionality if available (more accurate timings!) * will now dynamically select the best single-stage blur kernel (based on device support) * make use of compute_queue::execution_parameters_t and compute_queue::execute_with_parameters to properly enforce the "wait until completion" behavior * flipped the OpenGL parameter: must now start with --with-opengl to enable and use OpenGL, otherwise the software rendering is always used * removed now unused options/defines * it is now enforced that the image dim must be a multiple of 32 * added CMake support * updated README description + added example image * misc cleanup * NOTE: with these changes, the single-stage blur actually seems to perform better than the "dumb" blur on most devices
a2flo
as Florian Ziesche
More... about 2 months ago
* warp: unified renderer overhaul and modernization: * rendering is now multi-threaded: we create/run a thread per parallel/pipelined frame that may be active (right now: 2 frames, but this may be 3 in the future) * the main thread will now mostly only perform event handling and kick off the occasional frame rendering (main thread is throttled by a simple 500µs sleep for now) * set the new NO_RESOURCE_TRACKING and VULKAN_NO_BLOCKING context flags when creating the renderer compute_context (so that the per-frame objects actually have an effect in Metal and Vulkan) * unified_renderer: most of the render state is now allocated+stored per parallel frame (frame_object_t) and then of course only accessed by a single frame / render thread -> prevents any unnecessary synchronization or waiting between frames * unified_renderer: the renderer can now be used together with a different compute backend -> the render and compute contexts may differ now (e.g. Metal/Vulkan rendering, CUDA/Host-Compute warp computations) * unified_renderer: set proper sharing flags when creating the FBO images using the new SHARING_SYNC and SHARING_COMPUTE/RENDER_READ/WRITE flags (we either need to sync FBO images to the compute backend or need to sync the computed warp output to the render backend) * unified_renderer: added support for indirect rendering / indirect command pipelines (+added now required fences), which should a) generally be faster and b) don't require any encoding at run-time * unified_renderer: added support for flushing the renderer (waits until until in-flight frames are done and locks down rendering while active) * unified_renderer: due to the requirements of indirect command pipelines, all uniforms are now stored within a single per-frame uniform buffer, which is updated once per frame and then used by multiple shaders/kernels within that frame * unified_renderer: also need to encode the shadow image and skybox texture in argument buffers now * unified_renderer: added post_init() function that performs various initialization after the initial renderer init (-> will now use this to store a pointer to the model and camera, so that we don't need to specify these every time we want to render something) * unified_renderer: reduced shadow map dim from 16k to 8k (desktop) and 4k to 2k (iOS), we don't really need that much resolution and this takes up a lot of memory * unified_renderer: use VULKAN_HOST_COHERENT for the frame uniforms buffer (this significantly speeds up rendering, since we can directly write to it, instead of always allocating a tmp buffer for this each frame) * unified_renderer: the final frame present is now always blocking -> fixes the situation were a new frame (using the same object) might already be queued again, we don't want this, since it would require additional sync and isn't actually beneficial * unified_renderer: can now flag frame objects to let them rebuild their pipelines at the start of the next frame rendering (note that the renderer will be flushed for this) * unified_renderer: libwarp_camera_setup variable is now part of the renderer (so we don't accidentally pass a new variable (pointer) to libwarp that would lead to unnecessary recompilation) * unified_renderer: the decision whether a frame is a fully rendered frame or a warped frame is now done prior to the point where the frame is "enqueued", this way, we can now actually guarantee proper render/warp frame ordering and generally handle all of the different warp flags/state (note that each frame now also contains additional warp state so it knows what to do) * unified_renderer: use of argument buffers and indirect commands/rendering is now enabled by default * unified_renderer: added debug names/labels to more things * warp_shaders: updated to use array_param<> instead of just array<> for arrays of images * added --always-render option to only perform full frame rendering (instead of rendering + warping) * added --no-tessellation option to disable tessellation even if the device actually supports it * added --no-arg-buffer option to disable the use of argument buffers (also disables tessellation and indirect commands/rendering) * added --no-indrect option to disable the use of indirect commands/rendering * added key pad 0 - 5 key handling: these either display the correct color frame (0) or any of the debug visualizations (1-5) * unified_renderer/gl_renderer: added support for "debug blitting", i.e. can call new libwarp_debug_* (OpenGL) or internal (unified renderer) functions now that visualize the different warp buffers for debugging purposes * gl_renderer: generally simplify blitting * gl_renderer: libwarp_camera_setup variable is now global * camera: since the camera update and camera state query can happen from multiple threads now: store all important camera values in a camera_state_t object that can be updated and accessed safely from multiple threads (note that this is a ring buffer, so we generally shouldn't block other threads when updating) * camera: changed all float/single-precision variables to double/double-precision variables for higher accuracy * camera: moved camera state update into a separate function (this way an external caller can force an update) * auto_cam: force camera state update when running the auto cam now * updated copyright year * updated code according to latest libfloor changes * updated README * NOTE: this is optimized for low latency right now and we can easily get +20% more FPS by encoding/rendering the frames in parallel, but there are still some issues with that (in the future, I'll probably add an option to select between latency and bandwidth optimized rendering) * NOTE: requires libwarp >= v0.2.0 now
a2flo
as Florian Ziesche
More... about 2 months ago
* nbody: use new sharing flags
a2flo
as Florian Ziesche
More... about 2 months ago
* more build.sh updates
a2flo
as Florian Ziesche
More... 2 months ago
* one more build script update
a2flo
as Florian Ziesche
More... 2 months ago
* updated all build scripts
a2flo
as Florian Ziesche
More... 2 months ago
* misc updates/improvements: * obj_loader: use span<> variant of create_image() * dnn: added CMake support * dnn: fixed int type cast warnings * dnn: enable Vulkan compilation again (this still throws a validation error, but it works on NVIDIA drivers, will fix this later) * dnn: use span<> variant of create_buffer() * hlbvh: removed unused variables * img: use span<> variant of create_image() * warp: use span<> variant of create_image() * warp: use commit_and_finish() instead of commit() * updated Xcode projects
a2flo
as Florian Ziesche
More... 2 months ago
* nbody: removed Windows tile size workaround (can use 512 here as well now) * updated config: removed exec_model host-compute option * nbody/path_tracer: updated CMakeSettings.json to build with latest VS setup
a2flo
as Florian Ziesche
More... 3 months ago
* occ: added handling of new x86 and ARM CPU tiers/targets
a2flo
as Florian Ziesche
More... 4 months ago
* path_tracer: switched array parameter to array_param
a2flo
as Florian Ziesche
More... 4 months ago
* config: added new host-compute options
a2flo
as Florian Ziesche
More... 4 months ago
* updated README (getting there ...)
a2flo
as Florian Ziesche
More... 4 months ago
* updated README
a2flo
as Florian Ziesche
More... 4 months ago
* updated README
a2flo
as Florian Ziesche
More... 4 months ago
* updated README
a2flo
as Florian Ziesche
More... 4 months ago
* added new path tracer screenshots
a2flo
as Florian Ziesche
More... 4 months ago
* migrated README to .asciidoc
a2flo
as Florian Ziesche
More... 4 months ago
* path tracer improvements and modernization: * implemented some simple texture sampling support (this can be enabled by starting the program with the --with-textures parameter) * added some simple textures in data/textures/ * improved random value computation (use better multiplier, use better seed computation, can use bit_cast<float> now) * uses execution_parameters_t and execute_with_parameters() now * can now reset everything by pressing 'R' * misc other code modernization * updated build.sh, CMake and Xcode project * now links against SDL2_image
a2flo
as Florian Ziesche
More... 4 months ago
* nbody/warp: updated shaders (can use in.position now instead of frag_coord workaround in Vulkan)
a2flo
as Florian Ziesche
More... 4 months ago
* occ: added PTX 8.3 support
a2flo
as Florian Ziesche
More... 5 months ago
* warp: also updated the quaternion handling here
a2flo
as Florian Ziesche
More... 6 months ago
* nbody: updated quaternion-based rotation handling + fixed warning
a2flo
as Florian Ziesche
More... 6 months ago