Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

Introducing the NVIDIA NeMo Retriever generalizable agent retrieval pipeline

March 14, 2026

Coding, web apps with Gemini

March 13, 2026

E.SUN Bank and IBM build AI governance framework for banks

March 13, 2026
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Saturday, March 14
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»StarCoder2 and Stack V2
Tools

StarCoder2 and Stack V2

versatileaiBy versatileaiJuly 4, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email

BigCode is releasing StarCoder2, the next generation transparently trained open-code LLM. All StarCoder2 variants were trained on Stack V2, a new large and high quality code data set. Release all models, datasets, processing, and training code. For more information, see the paper.

What is StarCoder2?

StarCoder2 is an open LLMS family for code and comes in three different sizes with 3B, 7B and 15B parameters. The flagship StarCoder2-15B model is trained with more than 4 trillion tokens and 600 programming languages ​​from Stack V2. All models use grouped queries notes. This was trained using a 16,384 tokens context window with attention from a sliding window of 4,096 tokens and using a medium goal.

STARCODER2 offers three model sizes: a 3 billion parameter model trained with ServiceNow, a 7 billion parameter model trained by embracing the face, and a 15 billion parameter model trained by Nvidia using Nvidia Nemo on the NVIDIA accelerated infrastructure.

StarCoder2-3B was trained in 17 programming languages ​​from Stack V2 with 3 trillion tokens. The StarCoder2-7B was trained in 17 programming languages ​​from Stack V2 with over 3.5 trillion tokens. The StarCoder2-15B has been trained in over 600 programming languages ​​from Stack V2 with over 4 trillion tokens.

The StarCoder2-15B is the best in its size class and matches the 33B+ model in many ratings. The StarCoder2-3B matches the performance of the StarCoder1-15B.

StarCoder2 review

What is Stack V2?

Stack V2

Stack V2 is the largest open code data set suitable for LLM pre-training. Stack V2 is larger than Stack V1, with improved language and license detection procedures and better filtering heuristics. Additionally, training datasets are grouped by repository, allowing models to be trained in the repository context.

This dataset is the largest public archive of software source code and is derived from the software heritage archives that accompany development history. The software legacy launched by INRIA in collaboration with UNESCO is an open, non-commercial initiative to collect, store, and share source code for all publicly available software. We are grateful for the software legacy that provided us with access to this valuable resource. For more information, please visit the Software Heritage website.

Stack V2 is accessible through the face hub where you hug it.

About BigCode

BigCode is a jointly leading open scientific collaboration by embracing ServiceNow with faces working on responsible development of large-scale language models of code.

link

Model

Data and Governance

others

You can find all your resources and links at huggingface.co/bigcode!

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleTiktok floods Google VEO3 racist AI videos
Next Article Transform digital creativity and ethical boundaries
versatileai

Related Posts

Tools

Introducing the NVIDIA NeMo Retriever generalizable agent retrieval pipeline

March 14, 2026
Tools

Coding, web apps with Gemini

March 13, 2026
Tools

E.SUN Bank and IBM build AI governance framework for banks

March 13, 2026
Add A Comment

Comments are closed.

Top Posts

Gemini’s Security Safeguard Advance – Google DeepMind

May 23, 202513 Views

Wix Get 1 hour to expand generative AI capabilities and accelerate product innovation – TradingView News

May 23, 20259 Views

G7 skirts are safety discussions for Touchy AI – Politico

June 16, 20256 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

Gemini’s Security Safeguard Advance – Google DeepMind

May 23, 202513 Views

Wix Get 1 hour to expand generative AI capabilities and accelerate product innovation – TradingView News

May 23, 20259 Views

G7 skirts are safety discussions for Touchy AI – Politico

June 16, 20256 Views
Don't Miss

Introducing the NVIDIA NeMo Retriever generalizable agent retrieval pipeline

March 14, 2026

Coding, web apps with Gemini

March 13, 2026

E.SUN Bank and IBM build AI governance framework for banks

March 13, 2026
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2026 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?