Spaces:
Running
Let's Talk about AI
Hello, here is an open space for everyone to talk, share, ask and show anything about AI.
Has anyone pre-trained LLM model from scratch ? If yes then share your experience, things to consider while training, notes, tips etc.
Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs
Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs
Hey @Shashank2k3 , if you want your own LLM model, first you need huge data. You can start with fine tuning already available good LLM models like Gemma, Phi, LLAMA, mistral etc with your dataset. Start with small models of sizes like 4 to 7B parameters. For pre-training LLM from scratch you need enormous data, good resources like heavy duty GPUs and CPUs and also have knowledge of training techniques, NLP, etc . You can always brainstorm with ChatGPT to get more knowledge.
Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโm looking to dive deeper into the world of LLMs.
Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโm looking to dive deeper into the world of LLMs.
Great !
Yupp so what you guys do, i mean profession!!!
Yupp so what you guys do, i mean profession!!!
I am an Associate Data Scientist at Casepoint.
What about you ?
I am an Associate Software Developer at Brillius Technologies
I am an Associate Software Developer at Brillius Technologies
Great, where are you from and where is the company ?
I am from hyderabad , company origin is calfornia pleasonton , but i am working at hyderabad branch.
I am from hyderabad , company origin is calfornia pleasonton , but i am working at hyderabad branch.
Okay, so do u work in AI ML there ?
Hello guys
Hello guys
Yo whatsup ?
Hello guys
Yo whatsup ?
fine and what about u?
Hello guys
Yo whatsup ?
fine and what about u?
I am great. Where do u work ?
Hello guys
Yo whatsup ?
fine and what about u?
I am great. Where do u work ?
I work at HelpingAI
Yeah previously my company is in it staffing side , just 6 months ago they have started in IT solutions to build product called brillius AI tutor where we are developing an online learning platform for it professionals, in that I'm taking care of many things where AIML is one of those things.
Hello guys
Yo whatsup ?
fine and what about u?
I am great. Where do u work ?
I work at HelpingAI
Are you talking about this ? - https://huggingface.co/HelpingAI
Is it a company or Open Source Community ?
Yeah previously my company is in it staffing side , just 6 months ago they have started in IT solutions to build product called brillius AI tutor where we are developing an online learning platform for it professionals, in that I'm taking care of many things where AIML is one of those things.
Wow, great. Is it a startup or big company ?
Hello guys
Yo whatsup ?
fine and what about u?
I am great. Where do u work ?
I work at HelpingAI
Are you talking about this ? - https://huggingface.co/HelpingAI
Is it a company or Open Source Community ?
A small startup
Hello guys
Yo whatsup ?
fine and what about u?
I am great. Where do u work ?
I work at HelpingAI
Are you talking about this ? - https://huggingface.co/HelpingAI
Is it a company or Open Source Community ?A small startup
Okay and is the link for the Company correct ?
yes
yes
Is it your company (startup)? I mean are you owner or founder ?
yes
Is it your company (startup)? I mean are you owner or founder ?
Yes, I am its founder.
yes
Is it your company (startup)? I mean are you owner or founder ?
Yes, I am its founder.
Great, keep up the good work and whats your vision ?
@kalashshah19 its a medium scale company , but for me its a startup cause we are the first team of IT , and we dont have any seniors to guide us
yes
Is it your company (startup)? I mean are you owner or founder ?
Yes, I am its founder.
Great, keep up the good work and whats your vision ?
Making agi token and time efficient
Btw try Dhanishtha 2.0 preview
HelpingAI.co
@kalashshah19 its a medium scale company , but for me its a startup cause we are the first team of IT , and we dont have any seniors to guide us
ohh, I see
yes
Is it your company (startup)? I mean are you owner or founder ?
Yes, I am its founder.
Great, keep up the good work and whats your vision ?
Making agi token and time efficient
Cool
Btw try Dhanishtha 2.0 preview
HelpingAI.co
Sure I will !
hey bros
hey bros
yo whatsup !
Can we create an interface for our community for easy chatting
Yeah Sure, good idea !
Shashank2k3/Fake-Profile-Detection-Instagram, please check out my project Fake Profile Detection, where I have trained a random forest model using my own dataset consisting of metadata from various fake and real profiles on Instagram.
Can we create an interface for our community for easy chatting
Great idea! We should create a Discord server
Shashank2k3/Fake-Profile-Detection-Instagram, please check out my project Fake Profile Detection, where I have trained a random forest model using my own dataset consisting of metadata from various fake and real profiles on Instagram.
Nice, good effort !
Can we create an interface for our community for easy chatting
Great idea! We should create a Discord server
So you are suggesting that the HuggingFace community should chat in Discord community ๐
I think he told to create an inference on HuggingFace for chatting.
Can we create an interface for our community for easy chatting
Great idea! We should create a Discord server
So you are suggesting that the HuggingFace community should chat in Discord community ๐
I think he told to create an inference on HuggingFace for chatting.
๐
Hello everyone, I'm a 2nd year DSAI student at IIT Guwahati, i have fine-tuned few models and i have published few articles on ResearchGate. I am determined to build state-of-the-art AI in India for the world.
Its really great to be a part of this community.
Thank you
Hello everyone, I'm a 2nd year DSAI student at IIT Guwahati, i have fine-tuned few models and i have published few articles on ResearchGate. I am determined to build state-of-the-art AI in India for the world.
Its really great to be a part of this community.
Thank you
Hey Ashish, its great to have you in our community. We all will grow as Indian AI Devs and learn new things from each other. Keep in loop !
And also remember that we all are friends so no need to be formal !
Btw try Dhanishtha 2.0 preview
HelpingAI.co
i have used it (spaces), i even shared it with my friends
has anyone contributed datasets or models on AIkosh?
https://huggingface.co/Neural-Hacker/Qwen-BharatBench-Legal
please try this and share your feedback (i'm working on another version to make it even better)
sure !
sure !
please add it in the collection if you like it
Hi everyone :)
I created a dataset on Bhagavad Gita which happens to be the most liked dataset and most downloaded dataset on HuggingFace. I'd appreciate if you make something out of it.
Link :https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Dataset
Hi everyone :)
I created a dataset on Bhagavad Gita which happens to be the most liked dataset and most downloaded dataset on HuggingFace. I'd appreciate if you make something out of it.
Link :https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Dataset
already liked it and will use in a planned project
sure !
please add it in the collection if you like it
Done !
Hi everyone :)
I created a dataset on Bhagavad Gita which happens to be the most liked dataset and most downloaded dataset on HuggingFace. I'd appreciate if you make something out of it.
Link :https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Dataset
Woah great !
๐ Big news from XenArcAI!
Weโve just released our new dataset: BhagwatโGitaโInfinity ๐ธ๐
โจ Whatโs inside:
- Verseโaligned Sanskrit, Hindi, and English
- Clean, structured, and ready for ML/AI projects
- Perfect for research, education, and openโsource exploration
๐ Hugging Face: https://huggingface.co/datasets/XenArcAI/Bhagwat-Gita-Infinity
Letโs bring timeless wisdom into modern AI together ๐
I hope you all love this dataset and contribute positively to AI/ML Research
๐ Big news from XenArcAI!
Weโve just released our new dataset: BhagwatโGitaโInfinity ๐ธ๐
โจ Whatโs inside:
- Verseโaligned Sanskrit, Hindi, and English
- Clean, structured, and ready for ML/AI projects
- Perfect for research, education, and openโsource exploration
๐ Hugging Face: https://huggingface.co/datasets/XenArcAI/Bhagwat-Gita-Infinity
Letโs bring timeless wisdom into modern AI together ๐
I hope you all love this dataset and contribute positively to AI/ML Research
Great, congratulations !
Hey everyone ๐
I've been working on this dataset since last 15 days and it's finally done. I'm pleased to announce that I have created the first QnA dataset for Bhagavad Gita not only in English but also in Hindi and Gujarati.
Whatโs inside:
- Verseโaligned English, Hindi and Gujarati questions
- Each verse is paired with 5 question types exploring different aspects.
- Perfect for blending Spirituality and Technology
Link : https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita-QA
(P.S: I'm the one who created the most liked and most downloaded dataset for Bhagavad Gita)
Hey everyone๐
I fine-tuned Qwen3-0.6B on the soketlabs/bhasha-wiki-indic dataset ( i used only ~50k Hindi samples ). The training went fine, loss was around 1.46 and accuracy about 59% but the outputs are completely wrong and make no sense.
I think the issue might be with the dataset format since itโs plain text and not instruction-based so the model probably didnโt learn proper Q&A or instruction-following.
Has anyone faced this before? Should I switch to something like ai4bharat/indic-instruct-data-v0.1 for better results? Any suggestions? ๐
Same bro, I ft qwen2.5 3b with my gita dataset and the acc was 60% something with loss around 1.42 and the responses were okayish but still it was writing correct hindi but when I made a space out of it, it's just writing anything
Same bro, I ft qwen2.5 3b with my gita dataset and the acc was 60% something with loss around 1.42 and the responses were okayish but still it was writing correct hindi but when I made a space out of it, it's just writing anything
Yeah exactly bro, I thought good loss & accuracy meant it was learning properly but turns out it's more about the data structure. My model also writes proper Hindi but gives totally unrelated or random outputs. I think because plain text data only teaches it next-word prediction and not how to follow instructions or answer properly.
Iโm planning to try instruction-style data next (ai4bharat/indic-instruct-data-v0.1). Maybe thatโll fix it. Have you tried using instruction-tuned datasets or formatting your data as Q&A pairs? How you fixed that issue?
SarathiAI v1.0
Eternal Gita wisdom, guided by AI.
I tried fine-tuning Qwen2.5-3B model on a custom Bhagavad Gita dataset for the first time. The model now understands user queries more accurately and responds with answers grounded in Gita teachings. It's not perfect but I'm happy with everything I learned building it. Please give it a try and you can share what should I improve.
Link to Space: https://huggingface.co/spaces/JDhruv14/Sarathi.AI
Link to Model: https://huggingface.co/JDhruv14/Qwen2.5-3B-Gita-FT
Link to Dataset: https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita-QA
guys, i am building a model (fine-tuning), i want a lot of compute and vram please tell where i can find free or cheap gpu because aws, azure, etc are very costly and maybe kaggle won't work here but i will try it tomorrow. i am thinking to use ola cloud but if there's any platform better than ola then please suggest.
Hello everyone,
Iโm excited to share NEET_BioBERT , a fine-tuned lightweight transformer model trained specifically on NEET-style biology multiple-choice questions.
Itโs designed for educational AI assistants, practice exam bots and MCQ reasoning systems.
Explore it here: https://huggingface.co/Neural-Hacker/NEET_BioBERT
use and upvote it
Is it fine tuned on question papers only or all content of NEET like books, PDFs, etc ?
it is fine tuned on a dataset consisting ~800 questions including practice questions and pyqs
Nice !
I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the first high quality and clean audio dataset of Shrimad Bhagavad Gita.
I hope this dataset proves to be helpful to all๐ธ
Link to Dataset : https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Audio
เฅ เคจเคฎเฅ เคญเคเคตเคคเฅ เคตเคพเคธเฅเคฆเฅเคตเคพเคฏ ๐๐
I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the first high quality and clean audio dataset of Shrimad Bhagavad Gita.
I hope this dataset proves to be helpful to all๐ธ
Link to Dataset : https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Audio
เฅ เคจเคฎเฅ เคญเคเคตเคคเฅ เคตเคพเคธเฅเคฆเฅเคตเคพเคฏ ๐๐
Great man, will check it out.
tried many arenas and LLM battles but couldn't find the best LLM for Indian use cases? try Indic LLM Arena by AI4Bharat (IIT Madras) to find the most suitable LLM for Indian use cases.
link: https://arena.ai4bharat.org/#/chat
Will try !
I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the high quality and clean dataset of Mahabharata.
I hope this dataset proves to be helpful to all๐ธ
Link to Dataset : https://huggingface.co/datasets/JDhruv14/Mahabharata
I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the high quality and clean dataset of Mahabharata.
I hope this dataset proves to be helpful to all๐ธ
Link to Dataset : https://huggingface.co/datasets/JDhruv14/Mahabharata
Next stop Ramayan๐ช
I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the high quality and clean dataset of Mahabharata.
I hope this dataset proves to be helpful to all๐ธ
Link to Dataset : https://huggingface.co/datasets/JDhruv14/Mahabharata
Great ! Will try it out.
Guys, what do u think, which is better ?
- Finetuning LLM model for specific data
- RAG system for the same specific data
What and why ?
Guys, what do u think, which is better ?
- Finetuning LLM model for specific data
- RAG system for the same specific data
What and why ?
i believe it depends on the user/client what they want.
Fine-tuning works best with large labeled, consistent data that teaches behavior like format, rules, decisions. A major con is hallucination, in law, healthcare, etc accuracy is everything.
For large factual or knowledge-heavy data, RAG is better because it can scale, stays up to date, hallucinates less and avoids retraining.
tried many arenas and LLM battles but couldn't find the best LLM for Indian use cases? try Indic LLM Arena by AI4Bharat (IIT Madras) to find the most suitable LLM for Indian use cases.
link: https://arena.ai4bharat.org/#/chat
I tried it and it is quite handy. ๐
After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.
My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.
Please share it with everyone and please upvote me for my work. More to come soon.
Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma
After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.
My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.
Please share it with everyone and please upvote me for my work. More to come soon.
Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma
Amazing brother, this is really important for our Indic LLMs
After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.
My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.
Please share it with everyone and please upvote me for my work. More to come soon.
Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma
Amazing brother, this is really important for our Indic LLMs
Thanks man. It means a lot
under Mahabharatha collection, could you please change the 5th column name from translation to english_text ? and 6th column is actually translation which you have named it correctly.
A great work indeed.. thanks a lot. much needed
under Mahabharatha collection, could you please change the 5th column name from translation to english_text ? and 6th column is actually translation which you have named it correctly.
A great work indeed.. thanks a lot. much needed
I remember I have named one column transliteration in each dataset. Also, thank you for your kind words
Very soon my HF account will complete the firsr year. So, will it retain the 100 GB private storage, zero-GPU access, and inference capabilities after completing my first year as a user?
After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.
My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.
Please share it with everyone and please upvote me for my work. More to come soon.
Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma
Awesome job bro. You can fine tune models based on this and also publish them on HF referencing the same dataset.
Also publish everything on Kaggle.
Very soon my HF account will complete the firsr year. So, will it retain the 100 GB private storage, zero-GPU access, and inference capabilities after completing my first year as a user?
Don't know, haven't completed 1 year yet.
hey everyone,
read my blog on NPU and openVINO toolkit
Happy New year
