A

apple/axlearn

Settings | Report Duplicate

0

I Use This!

High Activity

Commits : Listings

Analyzed 1 day ago. based on code collected 1 day ago.

Commit Message	Contributor	Files Modified	Lines Added	Lines Removed	Code Location	Date
Aug 20, 2024 — Aug 20, 2025 Showing page 15 of 38 Search / Filter on:
fix (#852)	Hanzhi Zhou	More...				9 months ago
Generalizes decoder by taking input batch. (#861)	Mark Lee	More...				9 months ago
Addresses Github Action warning. (#862)	Mark Lee	More...				9 months ago
Unify init and prefill for attention layers. (#860)	Mark Lee	More...				9 months ago
Implement ConvXDTranspose (#853)	Dongseong Hwang	More...				9 months ago
[Bug Fix] Fix a nit issue for Jax rollback (#856)	kelvin-zou	More...				9 months ago
snapshot (#854)	kelvin-zou	More...				9 months ago
Generalizes pre- and post-spmd setup with init modules. (#848)	Mark Lee	More...				9 months ago
Fix RLHF slowdown in attention multi steps extend_step. (#849)	Dongseong Hwang	More...				9 months ago
Conv1D supports paddings. (#847)	Dongseong Hwang	More...				9 months ago
Ensure iterators are saved in per-process dir. (#846)	Mark Lee	More...				9 months ago
Add GitHub Actions Workflows for PR Validation (#795)	Mike Drob	More...				9 months ago
Minor cleanup. (#843)	Mark Lee	More...				9 months ago
Hardcode metadata.google.internal ip address to avoid transient DNS resolution issue (#844)	Meng (Ethan) Li	More...				9 months ago
CAUSAL padding=(dilate_window - stride, stride - 1), not (dilate_window - dilate_stride, dilate_stride - 1) (#841)	Dongseong Hwang	More...				9 months ago
Fix the changed MQA behavior by #837. (#842)	Dongseong Hwang	More...				9 months ago
Optimize MQA computation. (#837)	Dongseong Hwang	More...				9 months ago
Implement custom `max_data_shard_degree` and `shard_threshold_bytes` (#838)	Hanzhi Zhou	More...				9 months ago
Updates orbax and adds support for max save/restore concurrent gb. (#834)	Mark Lee	More...				9 months ago
Transformer extend_step supports multi steps generation (2/2). (#836)	Dongseong Hwang	More...				9 months ago
Remove stale moveaxis optimization in attention. (#835)	Dongseong Hwang	More...				9 months ago
Implement sequence_mask(). (#832)	Dongseong Hwang	More...				9 months ago
Transformer extend_step supports multi steps generation. (#831)	Dongseong Hwang	More...				9 months ago
Skip dst dir creation if no tf savables. (#830)	Mark Lee	More...				9 months ago
Removes legacy bias check for flash attention. (#829)	Mark Lee	More...				9 months ago
Speed up FA Backward pass in GPU via parallelizing sequence dimension (#818)	kelvin-zou	More...				9 months ago
Add bf16 test to subsampler. (#827)	Dongseong Hwang	More...				9 months ago
Quantizer returns ids as int32, not float32. (#826)	Dongseong Hwang	More...				9 months ago
[GKE]: support priority class (#828)	Zhaoyi Zhang	More...				9 months ago
Remove cleanup on save. (#825)	Mark Lee	More...				9 months ago

←
1
2
…
11
12
13
14
15
16
17
18
19
…
37
38
→