Hacker Newsnew | past | comments | ask | show | jobs | submit | cs-fan-101's commentslogin

Cerebras and Opentensor are pleased to announce BTLM-3B-8K (Bittensor Language Model), a new state-of-the-art 3 billion parameter open-source language model that achieves breakthrough accuracy across a dozen AI benchmarks.

BTLM-3B-8K Highlights: • 7B level model performance in a 3B model • State-of-the-art 3B parameter model • Optimized for long sequence length inference 8K or more • First model trained on the SlimPajama, the largest fully deduplicated open dataset • Runs on devices with as little as 3GB of memory when quantized to 4-bit • Apache 2.0 license for commercial use.

BTLM fits on mobile and edge devices with as little as 3GB of memory, helping democratize AI access to billions of devices worldwide.

BTLM was commissioned by the Opentensor foundation for use on the Bittensor network. Bittensor is a blockchain based network that lets anyone contribute AI models for inference, providing a decentralized alternative to centralized model providers like OpenAI and Google. Bittensor serves over 4,000 AI models with over 10 trillion model parameters across the network.

BTLM was trained on the newly unveiled Condor Galaxy 1 (CG-1) supercomputer, the first public deliverable of the G42 Cerebras strategic partnership. Cerebras acknowledges the generous support of G42 Cloud and the Inception Institute of Artificial Intelligence. Cerebras also thanks Cirrascale, who first introduced OpenTensor to Cerebras and provided additional technical support. Finally, Cerebras thanks the Together AI team for the RedPajama dataset.

To learn more, check out the following: - Blog: https://hubs.li/Q01YG_yr0 - Model on Hugging Face: https://hubs.li/Q01YG_HS0


[Cerebras employee here] Condor Galaxy 1 can support beyond 600 billion parameters. In standard config its 600B but it can scale to train upwards of 100T parameter models


Wow, that's a lot! Awesome info. Thank you!


Cerebras announced today that it has built and sold a 4 exaFLOPS AI Supercomputer, named Condor Galaxy 1 (CG-1), to its strategic partner G42, the Abu Dhabi-based AI pioneer.

Located in Santa Clara, CA, CG-1 is the first of nine interconnected 4 exaFLOPS AI supercomputers to be built for G42. Together these will deliver an unprecedented 36 exaFLOPS of AI compute and are expected to be the largest constellation of interconnected AI supercomputers in the world.

Condor Galaxy 1 (CG-1) is now up and running with 2 exaFLOPS and 27 million cores, built from 32 Cerebras CS-2 systems linked together into a single, easy-to-use AI supercomputer. While this is currently one of the largest AI supercomputers in production, in the coming weeks, CG-1 will double in performance with its full deployment of 64 Cerebras CS-2 systems, delivering 4 exaFLOPS of AI compute and 54 million AI optimized compute cores.

Upon completion of Condor Galaxy 1 (CG-1), Cerebras and G42 will build two more US-based 4 exaFLOPS AI supercomputers and link them together, creating a 12 exaFLOPS constellation. Cerebras and G42 then intend to build six more 4 exaFLOPS AI supercomputers for a total of 36 exaFLOPS of AI compute by the end of 2024.

Offered by G42 and Cerebras through the Cerebras Cloud, Condor Galaxy 1 (CG-1) delivers AI supercomputer performance without having to manage or distribute models over GPUs. With CG-1, users can quickly and easily train a model on their data and own the results.

* Press release: https://www.cerebras.net/press-release/cerebras-and-g42-unve...

* Blog: https://www.cerebras.net/blog/introducing-condor-galaxy-1-a-...


Recently, we announced in this post (https://news.ycombinator.com/item?id=35343763#35345980) the release of Cerebras-GPT — a family of open-source GPT models trained on the Pile dataset using the Chinchilla formula. Today, we are excited to announce the availability of the Cerebras-GPT research paper on arXiv.


Thank you for open sourcing these models!

I mentioned that the sizes of the models are relatively small (13B max). Is it an inherent limitation, or training a bigger model is possible, just has not been done in this exercise?


Someone else can answer this better than I, so I'll probably end up deleting this in an hour or two. But I think the purpose of this research was not to create an excellent GPT model. I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that:

1) Standard AI/ML scaling assumptions still apply on this hardware.

2) They have a starting point for hyper-parameter estimation and can get better results sooner.


> But I think the purpose of this research was not to create an excellent GPT model.

Yes, understood. I feel that this phrase is a response to the other commenter that suggested that Cerebras should release a ChatGPT-competitive model. I don't think it's easy and I don't think it's a focus for a hardware maker, such as Cerebras.

> I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that:

> 1) Standard AI/ML scaling assumptions still apply on this hardware.

This is my point. Is it possible to train a 100B model on Cerebras hardware? 500B? In this respect, the quality is secondary to the capability for the purpose of demonstration of capabilities.


> Maximal Update Parameterization (μP)

The use of μ (mu) as a sort of… pun acronym thing is pretty clever, nice one.


Thanks for publishing this. I quickly skimmed the paper, I saw the impressive linear scaling as you scaled to 16 nodes. How long did it take to train the various models in wall clock time?


Simply focusing on the "better in every regard" part of the comment.

One example where Cerebras systems perform well is when a user is interested in training models that require long sequence lengths or high-resolution images.

One example is in this publication, https://www.biorxiv.org/content/10.1101/2022.10.10.511571v2, where researchers were able to build genome-scale language models that can learn the evolutionary landscape of SARS-CoV-2 genomes. In the paper mentions, researchers mention "We note that for the larger model sizes (2.5B and 25B), training on the 10,240 length SARS-CoV-2 data was infeasible on GPU clusters due to out-of-memory errors during attention computation."


Someone posted this repost from the Cerebras Discord earlier, but sharing for visibility -

"We chose to train these models to 20 tokens per param to fit a scaling law to the Pile data set. These models are optimal for a fixed compute budget, not necessarily "best for use". If you had a fixed parameter budget (e.g., because you wanted to fit models on certain hardware) you would train on more tokens. We do that for our customers that seek that performance and want to get LLaMA-like quality with a commercial license"


Sounds like we should crowd-fund the cost to train and open source one of these models with LLaMa-like quality.

I'd chip in!


TBH that seems like a good job for Cerebras.

There are plenty of such efforts, but the organizer needs some kind of significance to attract a critical mass, and a AI ASIC chip designer seems like a good candidate.

Then again, maybe they prefer a bunch of privately trained models over an open one since that sells more ASIC time?


> Cerebras Discord

This is really weird to hear out loud.

I still think of Discord as a niche gaming chatroom, even though I know that (for instance) a wafer scale IC design company is hosting a Discord now.


The other AI companies also have discords. Midjourney does like biweekly townhalls on discord, from the CEO directly to the interested users. Emad also hangs out on the SD discord.

Talk about having a pulse on customers, doesn't get more direct than that. Any company that is both 1: Focused on a tech savvy/friendly customer base 2: The customer base is passionate about the product Should probably orient their entire customer support model around discord.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: