Google Colab Using GPU with Tensorflow version 1.0.0

Hoggaan commented May 12, 2020

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Author

onuryartasi commented May 12, 2020

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Thats your trained model, Use this method for restore your trained model.

checkpoint = "./chatbot_weights.ckpt"
session = tf.InteractiveSession()
session.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(session,checkpoint)

Hoggaan commented May 12, 2020

Sorry dude. It seems we misunderstood a bit. I organized my own dataset to train a chatbot. I was following step by step the Udemy course i shared its link already. When ever i use the colonel movie dataset of the course everything is well however when i try to use my own dataset Things not work properly by not saving the trained models of my Dataset. Can you help of that?

Man, This is a huge problem on me. Its my last year project and the time is running out.

If you can help, i can share the Notebook, the Dataset to check on your own. Thank you!

Author

onuryartasi commented May 13, 2020

Sorry dude. It seems we misunderstood a bit. I organized my own dataset to train a chatbot. I was following step by step the Udemy course i shared its link already. When ever i use the colonel movie dataset of the course everything is well however when i try to use my own dataset Things not work properly by not saving the trained models of my Dataset. Can you help of that?

Man, This is a huge problem on me. Its my last year project and the time is running out.

If you can help, i can share the Notebook, the Dataset to check on your own. Thank you!

Because your training set too short to learn, Your train process stopping before save checkpoint, Can you share code or notebook?

denizOgut commented Jun 11, 2020

Onur Bey merhaba , udemyde chatbot modelini train etmem lazım ama hata almaktayım , yardım edebilir misiniz ? nasıl ulaşabilirim size

Author

onuryartasi commented Jun 12, 2020

Onur Bey merhaba , udemyde chatbot modelini train etmem lazım ama hata almaktayım , yardım edebilir misiniz ? nasıl ulaşabilirim size

https://www.linkedin.com/in/onuryartasi buradan iletişime geçebilirsin.

daniel-kollanyi commented Jun 24, 2020 •

edited

Loading

Thank you, you have saved me a lot of time and unnecessary stressing on training the model! I have never used Google Colab before, so maybe it's a stupid question but it seems to be using almost all of the GPU RAM before I can even start to train the network:

Gen RAM Free: 12.3 GB | Proc size: 457.6 MB
GPU RAM Free: 567MB | Used: 10874MB | Util 95% | Total 11441MB

Eventually, the training dies with ResourceExhaustedError. I have tried to restart the session many times and checked the GPU RAM usage and it's around 500MB (used).
Do you have any idea why it behaves this way? Any guess would be appreciated! Thanks!

daniel-kollanyi commented Jun 25, 2020

Found the issue, it seems to be a Colab bug: https://stackoverflow.com/questions/48750199/google-colaboratory-misleading-information-about-its-gpu-only-5-ram-available
After some runtime restart, it seems to be working fine. Thanks again @onuryartasi for the Colab config!

Author

onuryartasi commented Jun 25, 2020

Thank you, you have saved me a lot of time and unnecessary stressing on training the model! I have never used Google Colab before, so maybe it's a stupid question but it seems to be using almost all of the GPU RAM before I can even start to train the network:

Gen RAM Free: 12.3 GB | Proc size: 457.6 MB
GPU RAM Free: 567MB | Used: 10874MB | Util 95% | Total 11441MB

Eventually, the training dies with ResourceExhaustedError. I have tried to restart the session many times and checked the GPU RAM usage and it's around 500MB (used).
Do you have any idea why it behaves this way? Any guess would be appreciated! Thanks!

Hi, Google Colab's some user's just access to part of the gpu. Usually this part %5 of all GPU.

joelpelo commented Dec 25, 2020

I get this error. Any way I can look for a different site for where I can get a different link for this?

!wget http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
!dpkg -i --force-overwrite libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb

--2020-12-25 17:24:16-- http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
Resolving archive.ubuntu.com (archive.ubuntu.com)... 91.189.88.142, 91.189.88.152, 2001:67c:1360:8001::24, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|91.189.88.142|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-12-25 17:24:16 ERROR 404: Not Found.

dpkg: error: cannot access archive 'libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb': No such file or directory

Author

onuryartasi commented Dec 29, 2020

I get this error. Any way I can look for a different site for where I can get a different link for this?

!wget http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
!dpkg -i --force-overwrite libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb

--2020-12-25 17:24:16-- http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb
Resolving archive.ubuntu.com (archive.ubuntu.com)... 91.189.88.142, 91.189.88.152, 2001:67c:1360:8001::24, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|91.189.88.142|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-12-25 17:24:16 ERROR 404: Not Found.

dpkg: error: cannot access archive 'libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb': No such file or directory

Hello,
You can download package from there http://launchpadlibrarian.net/373093738/libglx-mesa0_18.0.5-0ubuntu0~18.04.1_amd64.deb

adnvenkatesh commented Nov 10, 2021

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Author

onuryartasi commented Nov 10, 2021

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Hoggaan commented Nov 10, 2021 via email

Yeah. Thank you for the feedback. That time i was working Final project. Now I am doing masters of AI. :)

…

On Wed, 10 Nov 2021, 23:04 Onur Yartaşı, ***@***.***> wrote: ***@***.**** commented on this gist. ------------------------------ I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? [image: pk qaas] <https://user-images.githubusercontent.com/65222100/81706608-ccbac500-9478-11ea-9b0a-e8f84d1fdf71.PNG> Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://gist.github.com/7b861ff3cff77bcf68846db3bec0b2a6#gistcomment-3958102>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APRTLVCDT24SYLMNU255PQTULKUKLANCNFSM4I3H65SA> .

adnvenkatesh commented Nov 10, 2021

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Thankyou for the reply. Can you please suggest where might the problem be. it does not appear like the error will converge anytime soon. i have checked with different learning rates. this graph is for learning rate 0.001. if i increase the learning rate the gradient is exploding. i tried decreasing but still the error is not converging. Should i just keep the training for a large number of epochs like 500 and wait patiently

s.

Author

onuryartasi commented Nov 10, 2021

Yeah. Thank you for the feedback. That time i was working Final project. Now I am doing masters of AI. :)
…
On Wed, 10 Nov 2021, 23:04 Onur Yartaşı, @.> wrote: @.* commented on this gist. ------------------------------ I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? [image: pk qaas] https://user-images.githubusercontent.com/65222100/81706608-ccbac500-9478-11ea-9b0a-e8f84d1fdf71.PNG Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/7b861ff3cff77bcf68846db3bec0b2a6#gistcomment-3958102, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRTLVCDT24SYLMNU255PQTULKUKLANCNFSM4I3H65SA .

I am very happy, I wish you success.

Author

onuryartasi commented Nov 10, 2021 •

edited

Loading

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Thankyou for the reply. Can you please suggest where might the problem be. it does not appear like the error will converge anytime soon. i have checked with different learning rates. this graph is for learning rate 0.001. if i increase the learning rate the gradient is exploding. i tried decreasing but still the error is not converging. Should i just keep the training for a large number of epochs like 500 and wait patiently s.

Sorry, I'm not working with machine learning anymore. Whatever I say will be wrong.

adnvenkatesh commented Nov 10, 2021

Thankyou for the reply anyhow

I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help?

Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online

Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number.

Thankyou for the reply. Can you please suggest where might the problem be. it does not appear like the error will converge anytime soon. i have checked with different learning rates. this graph is for learning rate 0.001. if i increase the learning rate the gradient is exploding. i tried decreasing but still the error is not converging. Should i just keep the training for a large number of epochs like 500 and wait patiently s.

Sorry, I'm not working with machine learning anymore. Whatever I say will be wrong.

Thankyou for the reply anyhow

adnvenkatesh commented Nov 10, 2021

Yeah. Thank you for the feedback. That time i was working Final project. Now I am doing masters of AI. :)
…
On Wed, 10 Nov 2021, 23:04 Onur Yartaşı, @.> wrote: @.* commented on this gist. ------------------------------ I just tried to train the same Dateset which was in the course being used and it did well. when it comes to my problem for my own chatbot I prepared at least 60 Questions and answers dateset and applied to Data preprocessing. But It's not saving These models when trained. That's my problem sir. Any help? [image: pk qaas] https://user-images.githubusercontent.com/65222100/81706608-ccbac500-9478-11ea-9b0a-e8f84d1fdf71.PNG Hi...i did follow all the instructions that are mentioned in the course and i started training using gpu. i have trained for a 100 epochs using different learning rates, but minimum weights are being recorded at 10 epochs only, so weights are not getting updated. so, can you share the number of epochs it took for you for the error to converge and can i find the already trained weights of the course online Hi, almost 3 years passed and I don't remember what I used learning rate or epoch number. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/7b861ff3cff77bcf68846db3bec0b2a6#gistcomment-3958102, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRTLVCDT24SYLMNU255PQTULKUKLANCNFSM4I3H65SA .

can you please help me out with my problem if you can. This is now my final year project

MohammadHarisZia commented Jan 21, 2022

Hi there, Does it still work? I am getting an error that colab doesnt support tensorflow 1.0 anymore despite downgrading it to cuda 8.0

adnvenkatesh commented Jan 22, 2022

I could not make it work in google colab. I got the same error as you. I manually trained it on my local machine

MohammadHarisZia commented Jan 22, 2022

@adnvenkatesh can you tell me were the responses any good and also that i have an xps 15 9570 so it has thermal limitations, can you send me your ckpt files to just test it?

adnvenkatesh commented Jan 22, 2022

As i have mentioned in the above discussion the error did not converge at all. The responses were very bad..just repitions of random strings.I have read in various sources about what might be the problem. I saw in some sources that adams optimizer has convergence problems. So i have implemented gradient descent optimizer and made some changes and got this to work

MohammadHarisZia commented Jan 22, 2022

Can you kindly share your code ? I just need to understand the issues. Anyway, million thanks. Means a lot tbh for the help.

adnvenkatesh commented Jan 22, 2022 •

edited

Loading

On the github somewhere if i remember correctly there are some very good implementations of gradient descent optimizer in this context in addition with the bidirectional encoder layers. Try using them for understanding. They are far more clear. My code was too messy for you to understand it.😅

Sudar88 commented Feb 11, 2022 •

edited

Loading

Hey all,
Has anyone, in the recent period according to the given instructions, managed to run training on Colab of this model https://www.udemy.com/course/chatbot/ ?

MohammadHarisZia commented Feb 11, 2022

Hi there @Sudar88, Unfortunately I tried alot but could not make it work out.

Sudar88 commented Feb 11, 2022

@MohammadHarisZia thanks for the answer. After many attempts I was not able to run training this model according to the given instructions.
If anyone has found a solution, I hope they will respond.

onuryartasi/colab_gpu.ipynb

Hoggaan commented May 12, 2020

onuryartasi commented May 12, 2020

Hoggaan commented May 12, 2020

onuryartasi commented May 13, 2020

denizOgut commented Jun 11, 2020

onuryartasi commented Jun 12, 2020

daniel-kollanyi commented Jun 24, 2020 • edited Loading

daniel-kollanyi commented Jun 25, 2020

onuryartasi commented Jun 25, 2020

joelpelo commented Dec 25, 2020

onuryartasi commented Dec 29, 2020

adnvenkatesh commented Nov 10, 2021

onuryartasi commented Nov 10, 2021

Hoggaan commented Nov 10, 2021 via email

adnvenkatesh commented Nov 10, 2021

onuryartasi commented Nov 10, 2021

onuryartasi commented Nov 10, 2021 • edited Loading

adnvenkatesh commented Nov 10, 2021

adnvenkatesh commented Nov 10, 2021

MohammadHarisZia commented Jan 21, 2022

adnvenkatesh commented Jan 22, 2022

MohammadHarisZia commented Jan 22, 2022

adnvenkatesh commented Jan 22, 2022

MohammadHarisZia commented Jan 22, 2022

adnvenkatesh commented Jan 22, 2022 • edited Loading

Sudar88 commented Feb 11, 2022 • edited Loading

MohammadHarisZia commented Feb 11, 2022

Sudar88 commented Feb 11, 2022

daniel-kollanyi commented Jun 24, 2020 •

edited

Loading

onuryartasi commented Nov 10, 2021 •

edited

Loading

adnvenkatesh commented Jan 22, 2022 •

edited

Loading

Sudar88 commented Feb 11, 2022 •

edited

Loading