We're building FOSAI models! Cast your votes and pick your tunings.

Blaed@lemmy.world · edit-2 1 year ago

We're building FOSAI models! Cast your votes and pick your tunings.

librecat@lemmy.basedcount.com · 1 year ago

Are the llama2 models Apache 2.0 compatible? I think they use a custom license with some restrictions, could be totally wrong though.

Blaed@lemmy.world · edit-2 1 year ago

This will be a fine-tuned model, so it may inherit some of the permissions and license agreements as its foundation model and have other implications depending on your country or local law.

You are correct, if we chose Llama 2 - the fine-tune derivative may be subject to their original license terms. However, Apache 2.0 would apply and transfer to something like a fine-tuned version of Mistral, since its base license is also Apache 2.0.

If there is enough support - I’d be more than open to creating an entirely new foundation model family. This would be a larger undertaking than this initial fine-tuning deployment, but building a completely free FOSAI foundation family of models was the penultimate goal of this project so if this garners enough attention I could absolutely put energy and focus into creating another Mistral-like product instead of splashing around with fine-tuning.

Whatever would help everyone the most! I like where you’re thinking though, I’m going to update the thread to include an option to vote for a new foundation family instead. At the end of the day, it’s likely I’ll do all of the above - I’m just not sure in what order yet…

ffhein@lemmy.world · 1 year ago

You are correct, if we chose Llama 2 - the fine-tune derivative may be subject to their original license terms

The first time I read through the Llama 2 license I thought it said that any llama derivate work also had to be licensed under the same license, but reading it again I think it sounds like the only requirement is that you include a copy of the llama-2 license text. Though I suppose that if someone uses your l2 fine-tune to create something, it would also count as “llama 2 derivate work” and thus be affected by the original license. I’m obviously no license lawyer but personally I wouldn’t want risk a legal battle with a company the size of Meta, so I’d vote for the other options just to be one the safe side.

If there is enough support - I’d be more than open to creating an entirely new foundation model family.

Do you have the resources for this to be a viable option? Llama-2 7b used 184320 GPU hours on A100-80GB, and while the exact numbers for Mistral haven’t been revealed some article claims it was around 200k hours (which we don’t know if they were A100 or H100 hours). And if you have that kind of money to spend, are you confident that the end result will be better than Mistral? If not, why spend that much on creating something equivalent or possibly even inferior? Then there’s also the question of how long a model is going to be relevant before some other new model with all the latest innovations is released and makes everything else look outdated… Even if you can create a model which rivals llama-2 and mistral now, are you going to create a new one to compete with llama-3 and mistral-2 when those come along?

Sorry for the negativity but I think creating a base model sounds likely to be a massive waste of resources. If you have a lot of time and money to throw at this project, I think it would be much better spent on fine-tuning existing models.

Blaed@lemmy.world · 1 year ago

I wouldn’t want risk a legal battle with a company the size of Meta, so I’d vote for the other options just to be one the safe side.

Completely reasonable, I agree.

Do you have the resources for this to be a viable option?

Where there’s a will, there’s a way. I could muster the resources for a foundation model, but it’s definitely not the most optimal option we have at our disposal. The original plan was a.) fine-tune a small series (short-term) b.) release a foundation model (long-term). I only recently considered skipping Plan A, but I’m glad I’ve got feedback to prevent me from doing otherwise. Would’ve enjoyed the process nonetheless.

Are you confident that the end result will be better than Mistral? If not, why spend that much on creating something equivalent or possibly even inferior?

Of course not. I don’t do this to be the best. I offer to do this to understand. To document how to build and release a foundation model from start to finish is knowledge that could be valuable to someone else - which is why I was willing to skip ahead if that was a topic others wanted to dive more into. For me, it’s more about the friends we make along the way. There is grace in polishing a product and being the best, but I’d like to think there is also something special in doing something just to document it for others. There is something fulfilling exploring a new frontier with nothing but sheer curiosity.

Then there’s also the question of how long a model is going to be relevant before some other new model with all the latest innovations is released and makes everything else look outdated… Even if you can create a model which rivals llama-2 and mistral now, are you going to create a new one to compete with llama-3 and mistral-2 when those come along?

I also don’t do this to be relevant. To be a part of the this is enough for me. In my studies, I have found something bigger than me - I see myself doing this for many years so I know I’ll be around to see it evolve and current technologies become irrelevant in time. If you consider existing alongside these models as ‘competing’ then yes, I would be doing that I suppose.

Sorry for the negativity but I think creating a base model sounds likely to be a massive waste of resources. If you have a lot of time and money to throw at this project, I think it would be much better spent on fine-tuning existing models.

Don’t worry, it was very great feedback. Exactly why I made this post! I’m glad you made all your points. It’s the same logic I had (and the same logic I was willing to throw aside for others). At this point, it seems like fine-tuning is what most of you want to see. So fine-tuning it shall be!

Anony Moose@lemmy.ca · 1 year ago

I don’t have too much experience with deep learning, I’m just an enthusiastic spectator. With that said, it seems to me that it would help to build some momentum first with a finetuned foundational model based on an existing model. That would make it more feasible to set our eyes on the goal of a new foundation model in the future with a win under our belt.

Thanks so much for doing this, this seems really cool!

Blaed@lemmy.world · 1 year ago

I appreciate your comment! It seems like we’re going the fine-tuning route. I think it’s the best way to do it too. I’m still glad I floated around the foundation model idea. We’ll get one of our own eventually!

Welcome to the show! Enthusiast or not, you are part of !fosai@lemmy.world. Your input is valued and your curiosity is encouraged!

Anony Moose@lemmy.ca · 1 year ago

Woohoo! This is exciting :)

We're building FOSAI models! Cast your votes and pick your tunings.

We're building FOSAI models! Cast your votes and pick your tunings.

We’re Building FOSAI Models! 🤖

Fine-Tuned Use Case ☑️

Foundation Model ☑️

Model Name & Convention

Datasets ☑️

Alignment ☑️

License ☑️

Costs

Cast Your Votes! ☑️