Let's Talk about AI

#1
by kalashshah19 - opened
Indian AI Developers org
โ€ข
edited Aug 22, 2025

Hello, here is an open space for everyone to talk, share, ask and show anything about AI.

kalashshah19 pinned discussion
Indian AI Developers org

Has anyone pre-trained LLM model from scratch ? If yes then share your experience, things to consider while training, notes, tips etc.

Indian AI Developers org

Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs

Indian AI Developers org

Hi i am also intrested into LLM Model , i am about to start this reserach from next week please give any inputs

Hey @Shashank2k3 , if you want your own LLM model, first you need huge data. You can start with fine tuning already available good LLM models like Gemma, Phi, LLAMA, mistral etc with your dataset. Start with small models of sizes like 4 to 7B parameters. For pre-training LLM from scratch you need enormous data, good resources like heavy duty GPUs and CPUs and also have knowledge of training techniques, NLP, etc . You can always brainstorm with ChatGPT to get more knowledge.

Indian AI Developers org

Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโ€™m looking to dive deeper into the world of LLMs.

Indian AI Developers org

Hey @kalashshah19 , thanks for the input! I already have a solid foundation in these areas from my Bachelor's degree in AIML, and now Iโ€™m looking to dive deeper into the world of LLMs.

Great !

Indian AI Developers org

Yupp so what you guys do, i mean profession!!!

Indian AI Developers org

Yupp so what you guys do, i mean profession!!!

I am an Associate Data Scientist at Casepoint.
What about you ?

Indian AI Developers org

I am an Associate Software Developer at Brillius Technologies

Indian AI Developers org

I am an Associate Software Developer at Brillius Technologies

Great, where are you from and where is the company ?

Indian AI Developers org

I am from hyderabad , company origin is calfornia pleasonton , but i am working at hyderabad branch.

Indian AI Developers org

I am from hyderabad , company origin is calfornia pleasonton , but i am working at hyderabad branch.

Okay, so do u work in AI ML there ?

Indian AI Developers org

Hello guys

Indian AI Developers org

Hello guys

Yo whatsup ?

Indian AI Developers org

Hello guys

Yo whatsup ?

fine and what about u?

Indian AI Developers org

Hello guys

Yo whatsup ?

fine and what about u?

I am great. Where do u work ?

Indian AI Developers org

Hello guys

Yo whatsup ?

fine and what about u?

I am great. Where do u work ?

I work at HelpingAI

Indian AI Developers org

Yeah previously my company is in it staffing side , just 6 months ago they have started in IT solutions to build product called brillius AI tutor where we are developing an online learning platform for it professionals, in that I'm taking care of many things where AIML is one of those things.

Indian AI Developers org

Hello guys

Yo whatsup ?

fine and what about u?

I am great. Where do u work ?

I work at HelpingAI

Are you talking about this ? - https://huggingface.co/HelpingAI
Is it a company or Open Source Community ?

Indian AI Developers org

Yeah previously my company is in it staffing side , just 6 months ago they have started in IT solutions to build product called brillius AI tutor where we are developing an online learning platform for it professionals, in that I'm taking care of many things where AIML is one of those things.

Wow, great. Is it a startup or big company ?

Indian AI Developers org

Hello guys

Yo whatsup ?

fine and what about u?

I am great. Where do u work ?

I work at HelpingAI

Are you talking about this ? - https://huggingface.co/HelpingAI
Is it a company or Open Source Community ?

A small startup

Indian AI Developers org

Hello guys

Yo whatsup ?

fine and what about u?

I am great. Where do u work ?

I work at HelpingAI

Are you talking about this ? - https://huggingface.co/HelpingAI
Is it a company or Open Source Community ?

A small startup

Okay and is the link for the Company correct ?

Indian AI Developers org

yes

Indian AI Developers org
โ€ข
edited Sep 10, 2025

yes

Is it your company (startup)? I mean are you owner or founder ?

Indian AI Developers org

yes

Is it your company (startup)? I mean are you owner or founder ?

Yes, I am its founder.

Indian AI Developers org

yes

Is it your company (startup)? I mean are you owner or founder ?

Yes, I am its founder.

Great, keep up the good work and whats your vision ?

deleted
This comment has been hidden
Indian AI Developers org

@kalashshah19 its a medium scale company , but for me its a startup cause we are the first team of IT , and we dont have any seniors to guide us

Indian AI Developers org
โ€ข
edited Sep 10, 2025

yes

Is it your company (startup)? I mean are you owner or founder ?

Yes, I am its founder.

Great, keep up the good work and whats your vision ?

Making agi token and time efficient

Indian AI Developers org

Btw try Dhanishtha 2.0 preview
HelpingAI.co

Indian AI Developers org

@kalashshah19 its a medium scale company , but for me its a startup cause we are the first team of IT , and we dont have any seniors to guide us

ohh, I see

Indian AI Developers org

yes

Is it your company (startup)? I mean are you owner or founder ?

Yes, I am its founder.

Great, keep up the good work and whats your vision ?

Making agi token and time efficient

Cool

Indian AI Developers org

Btw try Dhanishtha 2.0 preview
HelpingAI.co

Sure I will !

Indian AI Developers org

hey bros

Indian AI Developers org

hey bros

yo whatsup !

Indian AI Developers org

Can we create an interface for our community for easy chatting

Indian AI Developers org

Yeah Sure, good idea !

Indian AI Developers org
โ€ข
edited Sep 13, 2025

Shashank2k3/Fake-Profile-Detection-Instagram, please check out my project Fake Profile Detection, where I have trained a random forest model using my own dataset consisting of metadata from various fake and real profiles on Instagram.

Indian AI Developers org

Can we create an interface for our community for easy chatting

Great idea! We should create a Discord server

Indian AI Developers org

Shashank2k3/Fake-Profile-Detection-Instagram, please check out my project Fake Profile Detection, where I have trained a random forest model using my own dataset consisting of metadata from various fake and real profiles on Instagram.

Nice, good effort !

Indian AI Developers org

Can we create an interface for our community for easy chatting

Great idea! We should create a Discord server

So you are suggesting that the HuggingFace community should chat in Discord community ๐Ÿ˜‚
I think he told to create an inference on HuggingFace for chatting.

Indian AI Developers org

Can we create an interface for our community for easy chatting

Great idea! We should create a Discord server

So you are suggesting that the HuggingFace community should chat in Discord community ๐Ÿ˜‚
I think he told to create an inference on HuggingFace for chatting.

๐Ÿ˜‚

Indian AI Developers org

Hello everyone, I'm a 2nd year DSAI student at IIT Guwahati, i have fine-tuned few models and i have published few articles on ResearchGate. I am determined to build state-of-the-art AI in India for the world.
Its really great to be a part of this community.
Thank you

Indian AI Developers org
โ€ข
edited Sep 24, 2025

Hello everyone, I'm a 2nd year DSAI student at IIT Guwahati, i have fine-tuned few models and i have published few articles on ResearchGate. I am determined to build state-of-the-art AI in India for the world.
Its really great to be a part of this community.
Thank you

Hey Ashish, its great to have you in our community. We all will grow as Indian AI Devs and learn new things from each other. Keep in loop !
And also remember that we all are friends so no need to be formal !

Indian AI Developers org

Btw try Dhanishtha 2.0 preview
HelpingAI.co

i have used it (spaces), i even shared it with my friends

Indian AI Developers org

has anyone contributed datasets or models on AIkosh?

Indian AI Developers org
โ€ข
edited Sep 26, 2025

https://huggingface.co/Neural-Hacker/Qwen-BharatBench-Legal
please try this and share your feedback (i'm working on another version to make it even better)

Indian AI Developers org

sure !

Indian AI Developers org

sure !

please add it in the collection if you like it

Indian AI Developers org

Hi everyone :)
I created a dataset on Bhagavad Gita which happens to be the most liked dataset and most downloaded dataset on HuggingFace. I'd appreciate if you make something out of it.
Link :https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Dataset

Indian AI Developers org

Hi everyone :)
I created a dataset on Bhagavad Gita which happens to be the most liked dataset and most downloaded dataset on HuggingFace. I'd appreciate if you make something out of it.
Link :https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Dataset

already liked it and will use in a planned project

Indian AI Developers org

sure !

please add it in the collection if you like it

Done !

Indian AI Developers org

Hi everyone :)
I created a dataset on Bhagavad Gita which happens to be the most liked dataset and most downloaded dataset on HuggingFace. I'd appreciate if you make something out of it.
Link :https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Dataset

Woah great !

Indian AI Developers org

๐Ÿš€ Big news from XenArcAI!

Weโ€™ve just released our new dataset: Bhagwatโ€‘Gitaโ€‘Infinity ๐ŸŒธ๐Ÿ“–

โœจ Whatโ€™s inside:

  • Verseโ€‘aligned Sanskrit, Hindi, and English
  • Clean, structured, and ready for ML/AI projects
  • Perfect for research, education, and openโ€‘source exploration

๐Ÿ”— Hugging Face: https://huggingface.co/datasets/XenArcAI/Bhagwat-Gita-Infinity

Letโ€™s bring timeless wisdom into modern AI together ๐Ÿ™Œ

I hope you all love this dataset and contribute positively to AI/ML Research

Indian AI Developers org

๐Ÿš€ Big news from XenArcAI!

Weโ€™ve just released our new dataset: Bhagwatโ€‘Gitaโ€‘Infinity ๐ŸŒธ๐Ÿ“–

โœจ Whatโ€™s inside:

  • Verseโ€‘aligned Sanskrit, Hindi, and English
  • Clean, structured, and ready for ML/AI projects
  • Perfect for research, education, and openโ€‘source exploration

๐Ÿ”— Hugging Face: https://huggingface.co/datasets/XenArcAI/Bhagwat-Gita-Infinity

Letโ€™s bring timeless wisdom into modern AI together ๐Ÿ™Œ

I hope you all love this dataset and contribute positively to AI/ML Research

Great, congratulations !

Indian AI Developers org

Hey everyone ๐Ÿ‘‹

I've been working on this dataset since last 15 days and it's finally done. I'm pleased to announce that I have created the first QnA dataset for Bhagavad Gita not only in English but also in Hindi and Gujarati.

Whatโ€™s inside:

  • Verseโ€‘aligned English, Hindi and Gujarati questions
  • Each verse is paired with 5 question types exploring different aspects.
  • Perfect for blending Spirituality and Technology

Link : https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita-QA

(P.S: I'm the one who created the most liked and most downloaded dataset for Bhagavad Gita)

Indian AI Developers org

Hey everyone๐Ÿ‘‹

I fine-tuned Qwen3-0.6B on the soketlabs/bhasha-wiki-indic dataset ( i used only ~50k Hindi samples ). The training went fine, loss was around 1.46 and accuracy about 59% but the outputs are completely wrong and make no sense.

I think the issue might be with the dataset format since itโ€™s plain text and not instruction-based so the model probably didnโ€™t learn proper Q&A or instruction-following.

Has anyone faced this before? Should I switch to something like ai4bharat/indic-instruct-data-v0.1 for better results? Any suggestions? ๐Ÿ™

Indian AI Developers org

Same bro, I ft qwen2.5 3b with my gita dataset and the acc was 60% something with loss around 1.42 and the responses were okayish but still it was writing correct hindi but when I made a space out of it, it's just writing anything

Indian AI Developers org

Same bro, I ft qwen2.5 3b with my gita dataset and the acc was 60% something with loss around 1.42 and the responses were okayish but still it was writing correct hindi but when I made a space out of it, it's just writing anything

Yeah exactly bro, I thought good loss & accuracy meant it was learning properly but turns out it's more about the data structure. My model also writes proper Hindi but gives totally unrelated or random outputs. I think because plain text data only teaches it next-word prediction and not how to follow instructions or answer properly.

Iโ€™m planning to try instruction-style data next (ai4bharat/indic-instruct-data-v0.1). Maybe thatโ€™ll fix it. Have you tried using instruction-tuned datasets or formatting your data as Q&A pairs? How you fixed that issue?

Indian AI Developers org

SarathiAI v1.0
Eternal Gita wisdom, guided by AI.

I tried fine-tuning Qwen2.5-3B model on a custom Bhagavad Gita dataset for the first time. The model now understands user queries more accurately and responds with answers grounded in Gita teachings. It's not perfect but I'm happy with everything I learned building it. Please give it a try and you can share what should I improve.

Link to Space: https://huggingface.co/spaces/JDhruv14/Sarathi.AI

Link to Model: https://huggingface.co/JDhruv14/Qwen2.5-3B-Gita-FT

Link to Dataset: https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita-QA

Indian AI Developers org
โ€ข
edited Oct 6, 2025

Well, I am feeling good seeing your work guys.

@JDhruv14 , @Neural-Hacker

Indian AI Developers org

guys, i am building a model (fine-tuning), i want a lot of compute and vram please tell where i can find free or cheap gpu because aws, azure, etc are very costly and maybe kaggle won't work here but i will try it tomorrow. i am thinking to use ola cloud but if there's any platform better than ola then please suggest.

Indian AI Developers org

Hello everyone,

Iโ€™m excited to share NEET_BioBERT , a fine-tuned lightweight transformer model trained specifically on NEET-style biology multiple-choice questions.
Itโ€™s designed for educational AI assistants, practice exam bots and MCQ reasoning systems.

Explore it here: https://huggingface.co/Neural-Hacker/NEET_BioBERT

use and upvote it

Indian AI Developers org

Is it fine tuned on question papers only or all content of NEET like books, PDFs, etc ?

Indian AI Developers org

it is fine tuned on a dataset consisting ~800 questions including practice questions and pyqs

Indian AI Developers org

Nice !

Indian AI Developers org

I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the first high quality and clean audio dataset of Shrimad Bhagavad Gita.

I hope this dataset proves to be helpful to all๐ŸŒธ

Link to Dataset : https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Audio

เฅ เคจเคฎเฅ‹ เคญเค—เคตเคคเฅ‡ เคตเคพเคธเฅเคฆเฅ‡เคตเคพเคฏ ๐Ÿ™๐Ÿ™

Indian AI Developers org

@JDhruv14 amazing brother, keep it up

Indian AI Developers org

I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the first high quality and clean audio dataset of Shrimad Bhagavad Gita.

I hope this dataset proves to be helpful to all๐ŸŒธ

Link to Dataset : https://huggingface.co/datasets/JDhruv14/Bhagavad-Gita_Audio

เฅ เคจเคฎเฅ‹ เคญเค—เคตเคคเฅ‡ เคตเคพเคธเฅเคฆเฅ‡เคตเคพเคฏ ๐Ÿ™๐Ÿ™

Great man, will check it out.

Indian AI Developers org

tried many arenas and LLM battles but couldn't find the best LLM for Indian use cases? try Indic LLM Arena by AI4Bharat (IIT Madras) to find the most suitable LLM for Indian use cases.
link: https://arena.ai4bharat.org/#/chat

image

Indian AI Developers org

Will try !

Indian AI Developers org

I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the high quality and clean dataset of Mahabharata.

I hope this dataset proves to be helpful to all๐ŸŒธ

Link to Dataset : https://huggingface.co/datasets/JDhruv14/Mahabharata

Indian AI Developers org

I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the high quality and clean dataset of Mahabharata.

I hope this dataset proves to be helpful to all๐ŸŒธ

Link to Dataset : https://huggingface.co/datasets/JDhruv14/Mahabharata

Next stop Ramayan๐Ÿ’ช

Indian AI Developers org
โ€ข
edited 10 days ago

I'm pleased to share that after putting in a lot of efforts and hard work, I have curated the high quality and clean dataset of Mahabharata.

I hope this dataset proves to be helpful to all๐ŸŒธ

Link to Dataset : https://huggingface.co/datasets/JDhruv14/Mahabharata

Great ! Will try it out.

Indian AI Developers org

Guys, what do u think, which is better ?

  1. Finetuning LLM model for specific data
  2. RAG system for the same specific data

What and why ?

Indian AI Developers org

Guys, what do u think, which is better ?

  1. Finetuning LLM model for specific data
  2. RAG system for the same specific data

What and why ?

i believe it depends on the user/client what they want.
Fine-tuning works best with large labeled, consistent data that teaches behavior like format, rules, decisions. A major con is hallucination, in law, healthcare, etc accuracy is everything.
For large factual or knowledge-heavy data, RAG is better because it can scale, stays up to date, hallucinates less and avoids retraining.

Indian AI Developers org

tried many arenas and LLM battles but couldn't find the best LLM for Indian use cases? try Indic LLM Arena by AI4Bharat (IIT Madras) to find the most suitable LLM for Indian use cases.
link: https://arena.ai4bharat.org/#/chat

image

I tried it and it is quite handy. ๐Ÿ‘

Indian AI Developers org

After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.

My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.

Please share it with everyone and please upvote me for my work. More to come soon.

Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma

Indian AI Developers org

After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.

My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.

Please share it with everyone and please upvote me for my work. More to come soon.

Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma

Amazing brother, this is really important for our Indic LLMs

Indian AI Developers org

After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.

My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.

Please share it with everyone and please upvote me for my work. More to come soon.

Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma

Amazing brother, this is really important for our Indic LLMs

Thanks man. It means a lot

Indian AI Developers org

under Mahabharatha collection, could you please change the 5th column name from translation to english_text ? and 6th column is actually translation which you have named it correctly.
A great work indeed.. thanks a lot. much needed

Indian AI Developers org

under Mahabharatha collection, could you please change the 5th column name from translation to english_text ? and 6th column is actually translation which you have named it correctly.
A great work indeed.. thanks a lot. much needed

I remember I have named one column transliteration in each dataset. Also, thank you for your kind words

Indian AI Developers org

Very soon my HF account will complete the firsr year. So, will it retain the 100 GB private storage, zero-GPU access, and inference capabilities after completing my first year as a user?

Indian AI Developers org
โ€ข
edited 6 days ago

After investing a lot of days and nights and passionately researching about our Sanatana Dharma, I'm excited to tell that I have over 150k Sanskrit Verses of different books with Translation, Transliteration and Shlokas in my collection.

My collection has around 10 books including Mahabharat, Ramayana, Gita, Markadeye Purana, Devi Mahatmya, Yoga Vasistha and so on.

Please share it with everyone and please upvote me for my work. More to come soon.

Link to Collection : https://huggingface.co/collections/JDhruv14/sanatana-dharma

Awesome job bro. You can fine tune models based on this and also publish them on HF referencing the same dataset.
Also publish everything on Kaggle.

Indian AI Developers org

Very soon my HF account will complete the firsr year. So, will it retain the 100 GB private storage, zero-GPU access, and inference capabilities after completing my first year as a user?

Don't know, haven't completed 1 year yet.

Indian AI Developers org

hey everyone,
read my blog on NPU and openVINO toolkit

https://huggingface.co/blog/Neural-Hacker/openvino

Indian AI Developers org

Happy New year

Sign up or log in to comment