-
-
Save thomwolf/ecc52ea728d29c9724320b38619bd6a6 to your computer and use it in GitHub Desktop.
import json | |
from pytorch_pretrained_bert import cached_path | |
url = "https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json" | |
# Download and load JSON dataset | |
personachat_file = cached_path(url) | |
with open(personachat_file, "r", encoding="utf-8") as f: | |
dataset = json.loads(f.read()) | |
# Tokenize and encode the dataset using our loaded GPT tokenizer | |
def tokenize(obj): | |
if isinstance(obj, str): | |
return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj)) | |
if isinstance(obj, dict): | |
return dict((n, tokenize(o)) for n, o in obj.items()) | |
return list(tokenize(o) for o in obj) | |
dataset = tokenize(dataset) |
getting the same error
same error here too
Should be fixed now
@thomwolf the error still persists. Unable to download the json dataset due to that issue.
@thomwolf the error still persists. Unable to download the json dataset due to that issue.
I fixed the error. It was an error on my end. I had to reconfigure the AWS credentials.
Should be fixed now
@thomwolf the error still persists. Unable to download the json dataset due to that issue.
I fixed the error. It was an error on my end. I had to reconfigure the AWS credentials.
I am still getting the same error. Please help.
@thomwolf the error still persists. Unable to download the json dataset due to that issue.
I fixed the error. It was an error on my end. I had to reconfigure the AWS credentials.
@sashank06 I am still getting the error, can you please share how you rectified the error.
this URL has worked for me
"https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json"
Thanks Khaled, this "https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json" worked for me too.
It worked for me with that url = "https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json"
Thanks Khaled
Hi, could you explain the data format like this?
train_self_original.txt file:
1 your persona: i like to remodel homes.
2 your persona: i like to go hunting.
3 your persona: i like to shoot a bow.
4 your persona: my favorite holiday is halloween.
5 hi , how are you doing ? i am getting ready to do some cheetah chasing to stay in shape . \t you must be very fast . hunting is one of my favorite hobbies . \t my mom was single with 3 boys , so we never left the projects .|i try to wear all black every day . it makes me feel comfortable .|well nursing stresses you out so i wish luck with sister|yeah just want to pick up nba nfl getting old|i really like celine dion . what about you ?|no . i live near farms .|i wish i had a daughter , i am a boy mom . they are beautiful boys though still lucky|yeah when i get bored i play gone with the wind my favorite movie .|hi how are you ? i am eating dinner with my hubby and 2 kids .|were you married to your high school sweetheart ? i was .|that is great to hear ! are you a competitive rider ?|hi , i am doing ok . i am a banker . how about you ?|i am 5 years old|hi there . how are you today ?|i totally understand how stressful that can be .|yeah sometimes you do not know what you are actually watching|mother taught me to cook ! we are looking for an exterminator .|i enjoy romantic movie . what is your favorite season ? mine is summer .|editing photos takes a lot of work .|you must be very fast . hunting is one of my favorite hobbies .
Hi, I am trying to download the file form the s3 bucket you have indicated in the link, but it raises an error:
NoCredentialsError: Unable to locate credentials
This happens at the function
s3_etag(url)
At seems as any kind of credentials is needed. Any help would be welcomed.