0
I Use This!
High Activity

Commits : Listings

Analyzed 1 day ago. based on code collected 1 day ago.
Aug 20, 2024 — Aug 20, 2025
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
fix (#852) More... 9 months ago
Generalizes decoder by taking input batch. (#861) More... 9 months ago
Addresses Github Action warning. (#862) More... 9 months ago
Unify init and prefill for attention layers. (#860) More... 9 months ago
Implement ConvXDTranspose (#853) More... 9 months ago
[Bug Fix] Fix a nit issue for Jax rollback (#856) More... 9 months ago
snapshot (#854) More... 9 months ago
Generalizes pre- and post-spmd setup with init modules. (#848) More... 9 months ago
Fix RLHF slowdown in attention multi steps extend_step. (#849) More... 9 months ago
Conv1D supports paddings. (#847) More... 9 months ago
Ensure iterators are saved in per-process dir. (#846) More... 9 months ago
Add GitHub Actions Workflows for PR Validation (#795) More... 9 months ago
Minor cleanup. (#843) More... 9 months ago
Hardcode metadata.google.internal ip address to avoid transient DNS resolution issue (#844) More... 9 months ago
CAUSAL padding=(dilate_window - stride, stride - 1), not (dilate_window - dilate_stride, dilate_stride - 1) (#841) More... 9 months ago
Fix the changed MQA behavior by #837. (#842) More... 9 months ago
Optimize MQA computation. (#837) More... 9 months ago
Implement custom `max_data_shard_degree` and `shard_threshold_bytes` (#838) More... 9 months ago
Updates orbax and adds support for max save/restore concurrent gb. (#834) More... 9 months ago
Transformer extend_step supports multi steps generation (2/2). (#836) More... 9 months ago
Remove stale moveaxis optimization in attention. (#835) More... 9 months ago
Implement sequence_mask(). (#832) More... 9 months ago
Transformer extend_step supports multi steps generation. (#831) More... 9 months ago
Skip dst dir creation if no tf savables. (#830) More... 9 months ago
Removes legacy bias check for flash attention. (#829) More... 9 months ago
Speed up FA Backward pass in GPU via parallelizing sequence dimension (#818) More... 9 months ago
Add bf16 test to subsampler. (#827) More... 9 months ago
Quantizer returns ids as int32, not float32. (#826) More... 9 months ago
[GKE]: support priority class (#828) More... 9 months ago
Remove cleanup on save. (#825) More... 9 months ago