LLM Moe - Search News

18h

Alibaba’s Qwen3 AI model coming this month, sources say, in bid to cement industry lead

The latest upgrade to the Qwen family of models will include a mixture-of-experts version and one with just 600 million ...

Digi Times20d

ByteDance open-sources COMET to boost MoE efficiency, accelerating LLM training by 1.7x

ByteDance's Doubao AI team has open-sourced COMET, a Mixture of Experts (MoE) optimization framework that improves large language model (LLM) training efficiency while reducing costs. Already ...

Ant Group’s use of China-made GPUs, not Nvidia, cuts AI model training costs by 20%

The fintech affiliate of Alibaba said its Ling-Plus-Base model can be ‘effectively trained on lower-performance devices’.

VentureBeat24d

Chain-of-experts (CoE): A lower-cost LLM framework that increases efficiency and accuracy

Mixture-of-experts (MoE), an architecture used in models such as DeepSeek-V3 and (assumedly) GPT-4o, addresses this challenge by splitting the model into a set of experts. During inference ...

NewsBytes8d

DeepSeek launches world's best non-thinking LLM—How it compares against ChatGPT?

DeepSeek, a leading Chinese AI firm, has improved its open-source V3 large language model, enhancing its coding and ...

Cryptopolitan2d

Zhipu AI undercuts AI competition with free agent launch, DeepSeek style

Zhipu AI unveiled a free AI agent on Monday, joining a wave of similar launches in China's competitive AI market. The product ...

5dOpinion

Microsoft: Too Much Cloud CapEx

Microsoft stock faces a 25% downside due to AI growth concerns, high CapEx, and slowing Azure growth. See why we are bearish ...

TMCnet1d

Altera Starts Production Shipments of Industry's Highest Memory Bandwidth FPGA

Altera Corporation, a leader in FPGA innovations, today announced production shipments of its Agilex™ 7 FPGA M-Series, the industry's first high-end, high-density FPGA to feature integrated high ...

Seeking Alpha8d

Kuaishou Technology (KUASF) Q4 2024 Earnings Call Transcript

In large language model R&D, we shifted our focus in Q3 of last year to the KwaiYii LLM MoE model, which has smaller parameters. MoE model helped us maintain our model's overall performance and ...

Chinese AI start-ups overhaul business models after DeepSeek’s success

DeepSeek’s industry-shaking breakthrough automates this final step, using a technique that rewards the AI model for doing the right thing. The Chinese company has also built smaller models that can be ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results