0
I Use This!
Very Low Activity

Commits : Listings

Analyzed about 18 hours ago. based on code collected about 21 hours ago.
Apr 24, 2023 — Apr 24, 2024
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
* Metal: updated + extended reflection info dumping: * floor: added toolchain.metal.dump_reflection_info config entry flag to enable this (disabled by default) * metal_program/metal_pipeline: will now query and dump reflection info / bindings when dump_reflection_info is true * metal_program/metal_kernel: moved old reflection handling from metal_kernel to metal_program * metal_program: made the reflection handling compatible with the new MTLBinding system + extended it to handle all function types and their parameters
a2flo
as Florian Ziesche
More... 15 days ago
* metal_kernel/llvm_toolchain: in argument buffers in Metal, array of buffers now also use the BUFFER_ARRAY type and have their size set to the #elements in the array (instead of the physical size in bytes and no array info) -> we no longer need to query reflection data when creating an argument buffer (which was deprecated) and can now compute all of the required info ourselves * floor: more cleanup: removed now unnecessary reload_kernels() + flag, swap() and start_frame(), and remove the "window_swap" parameter from end_frame()
a2flo
as Florian Ziesche
More... 15 days ago
* 14.0 toolchain update: Metal updates: * drop support for Metal 2.x, Metal 3.0 is now the minimum and default target -> removed version checks and obsolete code in various places * removed Metal NVIDIA workarounds (no longer needed, since there is no NVIDIA GPU supporting Metal 3.0) * CGCall/CodeGenModule/MetalFinal: metal kernels now always have 10 parameters (we always have sub-group/SIMD support) * libfloor metadata: treat array of buffers inside argument buffers the same way as in Vulkan -> sets the BUFFER_ARRAY type and size to #elements now instead of the physical size in bytes (which we can still get by multiplying by 8), this makes things easier on the libfloor/host side * MetalTypes: added APPLE_PLATFORM enum (via https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/BinaryFormat/MachO.def#L123) * MetalLibWriterPass/metallib-dis: make use of new APPLE_PLATFORM enum instead of using magic numbers * TypePrinter: added ComputeKernelDim/ComputeKernelWorkGroupSize support
a2flo
as Florian Ziesche
More... 15 days ago
* spring cleaning 2024 + bump version to 0.4.0-a1: * removed all OpenGL code/support * removed OpenAL/audio support * removing networking (asio/openssl/crypto) * removed CUDA < 12.0 / PTX < 8.0 and sm_3x/Kepler support, CUDA 12.0+ / PTX 8.0+ with Maxwell/sm_50 is now required * drop all Metal 2.x, < iOS 16.0 and < macOS 13.0 support (-> updated/removed various compile-time and run-time version checks) * metal_compute: requires a Metal 3.0 capable GPU (and Mac2 on macOS) now -> various device support flags are now true by default * bump macOS requirement to 13.0 and iOS requirement to 16.0 * SDL3 migration: * built against the 3.1.1 preview right now * updated all the places that query the underlying native system window/display/etc. * use SDL_PLATFORM_* instead of SDL_VIDEO_DRIVER_* defines * use run-time video driver checks/handling in places where this is necessary * vulkan_compute: initial Wayland support (surface extension + surface creation, but still untested) * event: mouse events now mostly contain float2 coords + coord scaling is done in floating point now * event: removed "pressure" on mouse events (we have proper pen events now in SDL -> still need to implement this) * event: misc minor event changes/updates * floor: removed our manual Windows DPI scale/awareness handling, this is implemented in SDL now (SDL_WINDOWS_DPI_AWARENESS hint *before* SDL_Init()) * floor: init more SDL sub-systems by default (timer, joystick, haptic, gamepad, sensor) * floor: use SDL_GetWindowSizeInPixels() to query the actual window size in pixels in get_physical_width/height/screen_size() * floor: get_scale_factor() now calls SDL_GetWindowDisplayScale() on all platforms except macOS/iOS (where we handle this ourselves) * CMake: removed use of SDL3main, this is no longer needed * updated build.sh / Xcode / CMake accordingly * symbol renaming done via rename_symbols.py * upgraded to C++23: * all code compiles in gnu++23/gnu++2b mode now (including device code) * disable all C++23 compat warnings * removed cpp_consteval.hpp and cpp_bitcast.hpp, we can now always use C++ standard functionality for that * bump CMake requirement to CMake 3.20 * note that building against libstdc++ requires GCC 13.0+ now * also set the C target to gnu17 * minor cleanup * clang/LLVM/libc++ 16.0+ are required for compilation now * Xcode 15.0+ is required for compilation now * removed handling of older toolchains * use "#pragma once" instead of manual header guards, excluding device common.hpp (need to include from CLI and as pch) and essentials.hpp (must be able to include more than once) * vulkan_compute: implemented window resizing (renderer reinit) support * vulkan_compute: removed get_swapchain_image_count() / get_swapchain_image_view() * llvm_toolchain: we no longer need NVIDIA workarounds on Metal 3.0+ * use filesystem::remove instead of manual 'rm' system calls * floor: when renderer selection fails (because no toolchain exists), abort right away * floor: removed acquire_context()/release_context() and related OpenGL-only functions * removed obsolete SDL pressure patch * essentials: removed Host-compute "constant" define hack/workaround * events: removed KERNEL_RELOAD/SHADER_RELOAD and kernel_reload_event/shader_reload_event * CMake: reorder include directory order (put all as "AFTER") * build.sh/CMake: silence clang 18 warning about "missing" designated field initializers, since this also triggers on intentionally default-constructed fields (note that for clang 18, we unfortunately need to fully disable all missing field initializer warnings, since only clang 19 added a specific -Wno-missing-designated-field-initializers for this) * ignore -Wswitch-default warnings (conflicts with the other switch warning) * build.sh/CMake/Xcode: added -Wno-nan-infinity-disabled due to fast-math UD * build.sh: preempt libc++ header include path + remove /usr/include include paths, since these can interfere with compiler includes * build.sh: support OpenCL MSYS2/MinGW system packages * build.sh: properly detect macOS on arm64 * build.sh: switch to gnu++2b instead of gnu++23 for compat reasons * build.sh: use dwarf-4 instead of dwarf-2 by default * build.sh: don't target sse4.1 on macOS/iOS * CMake: removed duplicated metal_args.hpp + added missing vulkan_args.hpp * fixed cuda_api compilation (misplaced #endif) * opencl_image: stencil images are no longer supported (this was only possible on shared OpenGL images), throw when this is specified during creation * const_math: remove "const" attribute from pure functions (pure is a superset) * aligned_ptr: added missing <string> include on Windows * enable/set Metal device features/props by default: 32KiB local memory, 1024 max local size, sub-group (shuffle) support, SIMD reduction, 32-bit float atomics, tessellation with 64 factor, image cube functionality, indirect command support, primitive ID support * universal_binary: bump everything to v3 and update the Metal target (removed everything that is no longer optional + added platform target (macOS and iOS right now)) * metal_queue: profiling is now always supported * floor: set vulkan_api_version to 1.3.231 since this is the required version * updated Xcode project (need to set CONFIGURATION_BUILD_DIR and SYMROOT now) * soft_f16: disable native fp16 support on x86 macOS * darwin_helper: enable HDR support on iOS (not tested yet) * more obsolete Metal/macOS/iOS code removal * various Xcode updates
a2flo
as Florian Ziesche
More... 17 days ago
* 14.0 toolchain: disable LTO build on MinGW/MSYS2 since I can't get it to work
a2flo
as Florian Ziesche
More... 27 days ago
* version bump to v0.3.0-f1
a2flo
as Florian Ziesche
More... 27 days ago
* updated README with latest example output for each target + updated example binaries
a2flo
as Florian Ziesche
More... 27 days ago
* vulkan_queue: renamed "experimental_no_blocking" -> "no_blocking", this is no longer experimental * floor_version.hpp: updated VS check for VS2022
a2flo
as Florian Ziesche
More... 27 days ago
* fixed iOS compilation
a2flo
as Florian Ziesche
More... 27 days ago
* 14.0 toolchain: going for release: * added + package licenses of (hopefully) all libraries/code used in the toolchain * enabled LTO build by default (can be disabled by -no-lto) * removed libz3.dll and libgomp-1.dll from Windows toolchain packaging and deployment (no longer needed) * ignore -Wunused-but-set-variable warnings
a2flo
as Florian Ziesche
More... 27 days ago
* 14.0 toolchain update: Vulkan improvements: * SPIRVWriter: Vulkan: added support for translating pointer comparisons via OpPtrDiff (<, <=, >=, >) * OCLToSPIRV: Vulkan: always use acquire-release semantics in atomics when none or sequentially-consistent was specified
a2flo
as Florian Ziesche
More... 28 days ago
* CUDA/Metal/Vulkan: proper memory ordering in atomics: * since hardware has now actually implemented support for this (there are actual functional differences), we need to properly specify this now * always use acquire-release semantics on all atomic operations for now, since this provides the most guarantees and is supported across all backends * in the future, I will probably add more fine-grained control over this * NOTE: Metal doesn't officially support anything but "relaxed" ordering, but the compiler and hardware does support other modes -> use acquire-release semantics with Metal 2.4 onwards * NOTE: toolchain update incoming * compute_algorithm: fixed #elements estimation for scan algorithms for non-sub-group implementations * Metal/device: renamed FLOOR_METAL_MEM_SCOPE_* -> FLOOR_METAL_MEM_FLAGS_*, removed old comment, and added FLOOR_METAL_SYNC_SCOPE_SUB_GROUP and FLOOR_METAL_MEM_FLAGS_OBJECT_DATA to reflect the current Apple naming and functionality
a2flo
as Florian Ziesche
More... 28 days ago
* print more informative error messages when kernel execution fails
a2flo
as Florian Ziesche
More... 29 days ago
* bump build requirements: now requires a clang/LLVM 13.0+ toolchain (or Xcode / CLI tools 13.3) * host_atomic: make use of floating point add/sub and integer min/max atomics * version bump to v0.3.0-b4
a2flo
as Florian Ziesche
More... 29 days ago
* 14.0 toolchain update: CGCall: fixed invalid bitcast by using an address space cast instead
a2flo
as Florian Ziesche
More... about 1 month ago
* const_math/rt_math/host: make clz(0)/ctz(0) work and return the same everywhere and at compile-time by manually handling 0 and returning the expected values (__builtin_clz/ctz(0) are not considered compile-time constants + return values may differ between x86 and ARM)
a2flo
as Florian Ziesche
More... about 1 month ago
* 14.0 toolchain update: various improvements/fixes: * removed VulkanUtils.h and moved functions into FloorUtils.h -> updated all users * FloorUtils: split the 32-bit integer simplification from simplify_gep_indices() into separate functions: simplify_integer_to_32bit() that does exactly that and simplify_const_integer_to_32bit() that only does this on constant integers * VulkanPreFinal: implemented lowering of llvm.memcpy instructions into loops (we need to do this since Vulkan/SPIR-V doesn't have a proper memcpy operation) * SPIRVWriter: when translating memcpy for Vulkan/SPIR-V, check if the copy length is larger than 1, abort if so (OpCopyMemory can only copy a single value) * LLVMToSPIRVTransformations: r/vulkan_utils/libfloor_utils/
a2flo
as Florian Ziesche
More... about 1 month ago
* llvm_toolchain/function_info: clarify that if a local size is set, it is the *required* local size -> renamed + updated all users * llvm_toolchain/function_info: added get_kernel_dim() helper function to query the kernel dimensionality (if the function is a kernel, returns 1 otherwise / by default)
a2flo
as Florian Ziesche
More... about 1 month ago
* 14.0 toolchain update: various improvements/fixes: * added FloorUtils.h: this currently implements helper functions that iterate over all users (or user instructions) of an llvm::Value in a general way, handling both direct users and single-indirection users of constant expressions * -> use new libfloor_utils::for_all_instruction_users/for_all_users everywhere where we iterate of an instruction or GV (or others) users * AddressSpaceFix: added trivial handling/replacement of llvm.memcpy intrinsics (even if fix_call_instrs is not set / can't be used) * MetalLibWriterPass: fixed incorrect language version (must be 3.1.0) + updated Metal compiler identity when building for Metal 3.1 + reformat * SPIRFinal: erase experimental_noalias_scope_decl LLVM intrinsics * when generating SPIR-V, we now set a "floor.generating_spirv" named metadata for easier detection * added SPIRFinal module pass: this is only run in SPIR mode (not SPIR-V!) to fix up global variables in the wrong address space (all must be constant or local) * SPIRVContainerWriterPass/SPIRVWriter: fixed detection of global variables being used inside a function -> uses new helper function that now also handles usage inside constant expressions * SPIRVInstruction: SPIRVMemoryAccess: added support for scope (make pointer available/visible) * SPIRVWriter: loads/stores of pointers in storage buffer address space are now marked with make pointer available/visible and non-private pointer flags/masks
a2flo
as Florian Ziesche
More... about 1 month ago
* llvm_toolchain: actually make OpenCL pch compilation work * llvm_toolchain: when printing the SPIR-V validator output, specify which target was used (Vulkan or OpenCL) * const_string: make _cs UDL work on compute backends (need to put the string in constant address space)
a2flo
as Florian Ziesche
More... about 1 month ago
* 14.0 toolchain update: SPIR-V updates: * Vulkan: updated/ported dxil-spirv CFG structurizer to the latest version (now at d6cff9039956d6f461625b01981c541eb724088c) * this now has initial support for loop/selection control masks (note that this isn't set from the outside yet) * SPIRVWriter: added handling of selection/loop control masks in floor.selection_merge/loop_merge * updated SPIR-V Tools to latest version (now @libfloor_202403 branch based on f20663ca7fec48fdc88e4c4d7c5889f8b4cc5664)
a2flo
as Florian Ziesche
More... about 1 month ago
* vulkan_args: in debug mode, when checking for the argument type when setting a buffer arg, we need to ignore implicit args + added asserts in places where "is_implicit" is not expected * cuda_buffer/cuda_image: fixed potential nullptr access
a2flo
as Florian Ziesche
More... about 1 month ago
* floor: added is_initialized() helper function to check if libfloor was already initialized
a2flo
as Florian Ziesche
More... about 1 month ago
* 14.0 toolchain update: added CUDA 12.4 and PTX 8.4 support
a2flo
as Florian Ziesche
More... about 1 month ago
* added CUDA 12.4 and PTX 8.4 support
a2flo
as Florian Ziesche
More... about 1 month ago
* compute_queue: made the current execute_with_handler() / execute_cooperative_with_handler() an overload of execute() / execute_cooperative() instead -> queue.execute(kernel, completion_handler, ...) * compute_queue: added execute_sync() and execute_cooperative_sync() that perform a blocking execution (same as execute_with_parameters() with "wait_until_completion" set to true) * compute_queue: added is_valid_work_size_type() helper function to simplify work_size_type checking * cuda_program: .reqntid does not actually enforce the local size when querying the max-threads-per-block of a function -> do this ourselves now + fail the kernel if the reported max total local size is actually smaller than we expected * cuda_device/metal_device/opencl_device/vulkan_device: set/init minimum expected local memory size (>= 16KiB) * device_info/llvm_toolchain: added dedicated_local_memory() helper function / FLOOR_COMPUTE_INFO_DEDICATED_LOCAL_MEMORY define that are set to the local memory size that a device supports * metal_compute: use public maxThreadgroupMemoryLength instead of private maxComputeThreadgroupMemory to query the local memory size
a2flo
as Florian Ziesche
More... about 1 month ago
* more get_underlying_metal_buffer_safe() / get_underlying_vulkan_buffer_safe() fixes/replacements
a2flo
as Florian Ziesche
More... about 2 months ago
* vulkan_args: in debug mode, all per-argument checks will now throw and be caught in set_arguments(), which will then print a more informative error (now including function name and argument index) and return false from set_arguments(), which is the intended error path * vulkan_args: added more argument checks in debug mode (will now test if the argument has the correct type for must variants) * vulkan_args: added nullptr checks to image/buffer array elements + buffer array elements may actually be nullptr now * cuda_buffer/cuda_image: fixed unused attributes in release mode
a2flo
as Florian Ziesche
More... about 2 months ago
* 14.0 toolchain update: * Vulkan: fixed nullptr check when checking for / handling argument buffers * made all bug report URLs point to the floor_llvm repo
a2flo
as Florian Ziesche
More... about 2 months ago
* cuda_program: ignore kernels that use too much local/shared memory (we only support static local/shared memory, not dynamic memory, so this is a hard limit for now)
a2flo
as Florian Ziesche
More... about 2 months ago