News
ByteDance's Doubao AI team has open-sourced COMET, a Mixture of Experts (MoE) optimization framework that improves large language model (LLM) training efficiency while reducing costs. Already ...
Compared to DeepSeek R1, Llama-3.1-Nemotron-Ultra-253B shows competitive results despite having less than half the parameters.
The latest upgrade to the Qwen family of models will include a mixture-of-experts version and one with just 600 million ...
Alibaba Cloud, a subsidiary of BABA-W (09988.HK), held its 2025 Spring Launch Event, announcing the offerings of more ...
Alibaba Cloud, a subsidiary of BABA-W (09988.HK), held its 2025 Spring Launch Event, announcing the offerings of more ...
Llama 4 Scoot is a compact yet highly capable model, boasting 17 billion active parameters and 16 experts. Its standout ...
Meta has debuted the first two models in its Llama 4 family, its first to use mixture of experts tech. A Saturday post from ...
Meta presented the latest generation of its large language models (LLM) at the weekend: Llama 4 Scout and Llama 4 Maverick.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results