Definition: AI is the broad field of creating machines or systems that perform tasks requiring human intelligence, such as reasoning, learning, and language understanding.
- Machine Learning (ML)
- Natural Language Processing (NLP)
- Generative AI
- Computer Vision
- Robotics
Definition: A subset of AI focused on developing algorithms that enable computers to learn from data and make decisions.
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Semi-Supervised Learning
- Transfer Learning
- Linear Regression: Models the relationship between a dependent variable and one or more independent variables.
- Support Vector Machines (SVM): Finds the hyperplane that best separates data into classes for classification and regression.
- Convolutional Neural Networks (CNNs): Used for image and video recognition tasks.
- Recurrent Neural Networks (RNNs): Suited for sequential data, like time series or language.
- Decision Trees: A tree-like model that splits data into branches for predictions.
- Random Forest: An ensemble method combining multiple decision trees to improve accuracy.
- Neural Networks: Computational models inspired by the human brain, used in deep learning.
- Transformers: Advanced architecture for NLP tasks, e.g., BERT and GPT.
- Clustering: Groups similar data points without pre-labeled categories.
- k-Means Clustering: Partitions a dataset into 'k' groups based on feature similarity.
- Principal Component Analysis (PCA): Reduces data dimensionality while preserving variability.
- Anomaly Detection: Identifies rare items or events that differ significantly from the majority of data.
- Q-Learning: Learns the value of actions in a particular state, without a model of the environment.
- Deep Q-Networks (DQN): Combines Q-learning with deep neural networks.
Knowledge gained through one task or dataset is used to improve model performance on another related task and/or different dataset.
Definition: A subset of ML using neural networks with many layers to model complex data patterns, effective for tasks involving large datasets and complex representations.
- Convolutional Neural Networks (CNNs): Primarily used for image and video recognition.
- Recurrent Neural Networks (RNNs): Well-suited for sequential data, like time series or language.
- Generative Adversarial Networks (GANs): Generates new data similar to the training data.
- Transformers: Advanced architecture used for NLP tasks like BERT and GPT.
- Temperature: Controls the randomness in the model’s predictions. Lower values produce more deterministic outputs, while higher values increase randomness and diversity.
- Top-k Sampling: Limits the number of probable next tokens to k, introducing randomness but within a more controlled range.
- Top-p (Nucleus) Sampling: Considers tokens whose cumulative probability exceeds a certain threshold (p), offering flexibility in diversity.
Definition: A subfield of AI focused on enabling computers to read, understand, and generate human language.
- Text Classification: Categorizes text into predefined categories.
- Language Modeling: Predicts the next word or sequence in a sentence (e.g., GPT models).
- Sentiment Analysis: Determines the sentiment expressed in text.
- Named Entity Recognition (NER): Identifies and classifies key elements in text.
- Machine Translation: Translates text between languages (e.g., evaluated using BLEU).
- Traditional ML Models: Naive Bayes, Support Vector Machines.
- Deep Learning Models: RNNs, LSTMs, Transformers (e.g., BERT, GPT).
- BLEU (Bilingual Evaluation Understudy): Evaluates machine-generated translations by comparing them to reference translations.
- ROUGE: Measures the overlap between machine-generated and reference summaries.
Definition: A subfield of AI focused on generating new content (text, images, audio) similar to the training data, using models like GANs and transformers.
- Text Generation: Produces new text based on input prompts (e.g., GPT models).
- Image Generation: Creates images from text descriptions or other images (e.g., DALL-E).
- Audio Generation: Synthesizes new audio, like music or speech.
- BLEU: Evaluates machine-generated text quality against reference text.
- ROUGE: Assesses the overlap between machine-generated and reference summaries.
In generative models like GPT, temperature, top-k, and top-p (nucleus sampling) are critical for controlling the creativity and coherence of the generated text:
- Temperature: Controls the level of randomness in generated content.
- Top-k: Limits the next token selection to the top k most probable choices.
- Top-p (Nucleus Sampling): Selects tokens with cumulative probabilities that meet or exceed a threshold (p).
Enables access to foundation models (FMs) such as GPT, allowing integration of temperature, top-k, and top-p during text generation to fine-tune outputs.
Example Bedrock Usage:
import boto3
client = boto3.client('bedrock')
response = client.generate_text(
model_id='gpt-model-id', # Replace with the actual model ID in AWS Bedrock
input_text='What is the capital of France?',
temperature=0.7, # Balances creativity and coherence
top_k=50, # Consider the top 50 most likely tokens
top_p=0.9, # Use nucleus sampling with a threshold of 90%
max_tokens=100 # Number of tokens to generate
)
print(response['generated_text'])