Python Random Sample from List: Without Replacement, With Replacement & NumPy
Picking random elements from a list is one of those tasks that sounds trivial until you need to do it correctly. Should the same element be able to appear twice? Does order matter? Do some elements have a higher probability of being picked? Should the result be reproducible for debugging?
Each of these questions points to a different Python tool. random.sample() gives you sampling without replacement. random.choices() gives you sampling with replacement. numpy.random.choice() handles both and works efficiently on large arrays. sklearn.utils.random.sample_without_replacement() is optimised for the specific patterns that machine learning pipelines need.
This guide covers all four methods in full detail - syntax, parameters, working code, edge cases, and exactly when to use each - plus seeding for reproducibility, weighted sampling, a performance comparison, and five real-world use cases from data science and software development. Board Infinity's guide on boolean in Python covers the foundational Python data types that underpin how random functions handle edge cases like empty lists and duplicate elements.
Who This Guide Is For
Sampling With vs Without Replacement: The Core Distinction
Before looking at any code, understanding the difference between sampling with and without replacement is essential - it determines which tool to use for every sampling task.
Method 1: random.sample() - Python Random Sample Without Replacement
random.sample() is the standard Python library function for random sampling without replacement. It returns a new list containing k unique elements chosen randomly from the population, without modifying the original.
Syntax and Parameters
Basic Usage - Sample from List
Edge Cases and Error Handling
Method 2: random.choices() - Python Random Sample With Replacement
random.choices() (added in Python 3.6) selects k elements from the population with replacement - the same element can be chosen multiple times. It also supports weighted sampling, where some elements have a higher probability of being selected than others.
random.sample() vs random.choices(): Key Differences
Method 3: numpy.random.choice() - For Arrays and Data Science
numpy.random.choice() is the NumPy equivalent for random sampling and is the preferred tool in data science contexts when working with NumPy arrays. It supports both with and without replacement, weighted sampling, and works efficiently with large datasets.
Method 4: sklearn.utils.random.sample_without_replacement() - For ML Pipelines
sklearn.utils.random.sample_without_replacement() is a specialised function from scikit-learn designed for high-performance sampling without replacement in machine learning pipelines. It samples integer indices rather than actual elements, and offers three algorithmic methods optimised for different population/sample size ratios.
Seeding for Reproducibility
Reproducibility is non-negotiable in data science and machine learning. When you sample training data, split datasets, or generate random batches, you need to guarantee that running the same code tomorrow produces the same result as today.
Method 5: Manual Sampling Using Shuffle
For cases requiring complete control or environments without NumPy, you can implement sampling manually using random.shuffle().
Real-World Use Cases
Use Case 1: Train/Test Split Without Replacement
Use Case 2: Bootstrap Sampling With Replacement
Use Case 3: A/B Test Group Assignment
Use Case 4: Data Augmentation With Weighted Sampling
Use Case 5: Lottery Draw
Performance Comparison
Common Mistakes and How to Avoid Them
Conclusion
Python gives you multiple tools for random sampling, each designed for a specific context. random.sample() is the default for sampling without replacement from any Python sequence. random.choices() handles sampling with replacement and weighted probability distributions. numpy.random.choice() is the tool of choice in data science contexts. sklearn.utils.random.sample_without_replacement() is the specialist for ML pipelines where you need optimal performance sampling indices from very large populations.
Three things to take away: first, always set a seed when sampling in experiments or any reproducibility-sensitive context. Second, the default for most data science tasks is without replacement - use random.choices() with replacement only when you explicitly need bootstrapping or augmentation. Third, numpy.random.choice() with the Generator API is the modern NumPy standard - prefer it over the legacy np.random.seed() API in new code.
The best next step is to practice with real datasets. Board Infinity's guide on building a data science portfolio has project ideas where random sampling techniques like train/test splits, bootstrap sampling, and cross-validation are applied in realistic contexts.
Frequently Asked Questions
Q1. How do I get a random sample from a list in Python? Use random.sample(your_list, k) where k is the number of elements you want. This returns a new list of k unique elements chosen randomly without replacement. Example: random.sample([1, 2, 3, 4, 5], 3) might return [4, 1, 3].
Q2. What is the difference between random.sample() and random.choices()? random.sample() samples without replacement - each element appears at most once and k cannot exceed the population size. random.choices() samples with replacement - the same element can appear multiple times and k can be any positive integer. random.choices() also supports weighted probabilities; random.sample() does not.
Q3. How do I do Python random sample without replacement? Use random.sample(population, k) - it always samples without replacement. For NumPy arrays, use numpy.random.choice(arr, size=k, replace=False). For ML pipelines, use sklearn.utils.random.sample_without_replacement(n_population, n_samples).
Q4. How do I do Python random sample with replacement? Use random.choices(population, k=k) from the standard library. For NumPy, use numpy.random.choice(arr, size=k, replace=True). For weighted sampling with replacement, use random.choices(population, weights=[...], k=k).
Q5. How do I make random.sample() reproducible? Set a seed before calling it: random.seed(42) then random.sample(...). For NumPy, use the Generator API: rng = numpy.random.default_rng(seed=42) then rng.choice(...). The same seed always produces the same result.
Q6. Can random.sample() return duplicates? random.sample() never returns an element from the same index twice. However, if your input list contains duplicate values like [1, 1, 2, 3], both instances of 1 can appear in the result because sample() treats each index as unique, not each value.
Q7. What is random choice without replacement in Python? It means selecting items from a collection where each selected item cannot be picked again. In Python, random.sample() implements this. random.choice() (singular) picks only one element and does not prevent duplicates if called multiple times.
Further Reading
Board Infinity Guides:
- Guide to Random Module in Python
- Boolean in Python
- Building a Data Science Portfolio
- while True in Python
- Python List to Tuple
External Resources:
- Python Official Docs - random.sample() - authoritative specification including all parameters and edge case behaviour
- NumPy Docs - numpy.random.Generator.choice - official documentation for the modern NumPy Generator API
- scikit-learn - sample_without_replacement - official scikit-learn documentation with method selection guidance