Several days ago, I was playing a chinese card game 鋤大D with a bunch of friends and I was wondering what were the different odds of obtaining certain sets of cards. So instead of calculating these probabilities, why not just estimate them ?
The rules of the game
Let me explain here briefly how this game works. We start with a shuffled deck of 52 cards and distribute them to 4 players, one by one, eventually everyone will have 13 cards. Obviously we have our own terms in chinese but it will be easier to use the poker terms here. The players then arrange the cards to pairs, straights, flushes, full houses etc. The one with the 3♦ can start the game and must play something that includes this 3♦ according to the rules. Cards may only be played as singles or in groups of two, three or five, in combinations which resemble poker hands. The next player (clockwise or counterclockwise) then plays something with the same number of cards that has a higher rank than what the previous player played. If nobody has or wants to play a set of cards with a higher rank, then the player can play what he wants.
The card with the highest rank is the 2 followed by A and so forth, with their respective suit so in increasing order it looks like this
Card 

3♦ 
3♣ 
3♥ 
3♠ 
4♦ 
4♣ 
4♥ 
4♠ 
5♦ 
5♣ 
5♥ 
5♠ 
6♦ 
6♣ 
6♥ 
6♠ 
7♦ 
7♣ 
7♥ 
7♠ 
8♦ 
8♣ 
8♥ 
8♠ 
9♦ 
9♣ 
9♥ 
9♠ 
10♦ 
10♣ 
10♥ 
10♠ 
J♦ 
J♣ 
J♥ 
J♠ 
Q♦ 
Q♣ 
Q♥ 
Q♠ 
K♦ 
K♣ 
K♥ 
K♠ 
A♦ 
A♣ 
A♥ 
A♠ 
2♦ 
2♣ 
2♥ 
2♠ 
Combinations of cards
So, recall that everyone has 13 cards. In fact, there are a total of 635 013 559 600 possible hands (in comparison Poker with a hand of 5 cards has 2 598 960 possible hands).
The combinations in this game is pretty straight forward if you have played some Poker :
 Combination of 2 cards : pairs
 Combination of 3 cards : three of a kind
Combinations of 5 cards : (in increasing rank, a Flush beats a Straight etc.)
 Straight (sequential of 5 cards)
 Flush (5 cards of the same suit)
 Full house (3 same and a pair)
 Four of a kind (4 same and a card)
Straight flush (sequential and same suit)
Building the framework
As I don’t know any package that could help me define the rules of this game, I’ll just build up the whole framework from scratch. First thing we need is to shuffle the cards and shuffling the cards is the same as randomly reassinging the order of appearence of the cards (i.e. 1 to 52).
set.seed(1)
shuffle<sample(1:52,52,replace=F)
print(shuffle)
## [1] 14 19 29 45 10 43 44 30 28 3 9 8 46 15 49 51 26 35 13 36 38 7 20
## [24] 4 41 11 1 48 21 40 27 34 25 50 39 12 24 2 22 6 32 23 52 5 42 16
## [47] 33 47 17 37 31 18
Applied to our deck yields



So the first card is 4♣, the second card is 5♥ and so on. Dealing the cards is the same as saying that the first player gets every forth card and so on.
Here around that little table, 4 friends, Alibaba, Baidu, Chinamobile and Didi are playing this game. So according to our shuffled deck, Alibaba will get 4♣,3♣,7♠ …




As there are 13 cards, the player often needs a few minutes to order his hand and figure out a strategy. So after a minute, the players realise they have




We have here 4 hands of 13 cards, Alibaba got a three of a kind, Didi was even luckier and got even 2 three of a kinds.
In fact the players got
Player  Singles  Pair  Three_of_a_kind  Four_of_a_kind 

Alibaba  6  2  1  0 
Baidu  7  3  0  0 
Chinamobile  9  2  0  0 
Didi  5  1  2  0 
So we can clearly see that according to the first shuffle, we got 8 pairs, 3 three of a kind, 2 full houses and 0 four of a kind.
Let’s make them play a 1000 times.
I’d recommend to set the shuffle orders once and reuse them later. It takes up some space but very often it turns out to be very useful.
Simulations<1000
#library(foreach)
#library(doParallel)
cl<makeCluster(4)
registerDoParallel(cl)
shuffles<foreach(game=1:Simulations)%dopar%{sample(1:52,52,replace=F)}
What we really want is the aggregated information of a game, e.g. the number of pairs, three of a kinds etc. So I designed a function that will give me the information I need according to a certain shuffled order.
one.game<function(deck,shuffle,i){
shuffle %>% deck[.,] %>% mutate(Player=dist) %>% group_by(Player,Rank) %>% summarise(n=n()) %>%
summarise(
Pair = sum(n==2),
Three_of_a_kind = sum(n==3),
Four_of_a_kind = sum(n==4),
Game = i
)
}
Games<foreach(game=1:Simulations,.combine=rbind.data.frame,.packages="tidyverse")%dopar%{
one.game(deck,shuffles[[game]],game)
}
stopCluster(cl)
Games
## # A tibble: 4,000 x 5
## Player Pair Three_of_a_kind Four_of_a_kind Game
## <fctr> <int> <int> <int> <int>
## 1 Alibaba 2 2 0 1
## 2 Baidu 4 0 0 1
## 3 Chinamobile 4 0 0 1
## 4 Didi 3 1 0 1
## 5 Alibaba 2 0 1 2
## 6 Baidu 6 0 0 2
## 7 Chinamobile 3 0 0 2
## 8 Didi 3 0 0 2
## 9 Alibaba 3 0 0 3
## 10 Baidu 3 0 0 3
## # ... with 3,990 more rows
Our friends have been hard at work in playing all these 1000 games … for science!
From simulations to probabilities
Recall that we wish to estimate the probabilities that a hand of 13 cards contains a certain subset of cards. We will start with the combinations with
 same rank, e.g. (AAA), (QQ), (KKKK)
 same suit, e.g. (♠♠♠♠♠), (♣♣♣♣♣)
 sequence, e.g. (9,10,J,Q,K)
Same Rank
Let’s start with the most frequent ones, the pairs. So let PAIR be the random variable counting the numbers of pair in a hand. PAIR can take values in \([0,6]\) because a hand has 13 cards. The indicator that interests us is : \[\frac{\#PAIR=0}{4000}\approx\mathbb{P}[PAIR=0]\]
and equivalently
\[1\frac{\#PAIR=0}{4000}\approx\mathbb{P}[PAIR>0]\]
Pair  Count  Odds 

0  88  0.02 
1  447  0.11 
2  1044  0.26 
3  1326  0.33 
4  845  0.21 
5  238  0.06 
6  12  0.00 
So according to our simulations, 88 games out of the \(4 \times 1000\) didn’t have any pairs.
These are simple counts, because the hidden pairs in the three/four of a kinds have not been counted. So if we want to count these hidden pairs, we would need to build a new indicator because there is \(1\) pair in a three of a kind and \(2\) pairs in a four of a kind. So let Three_of_a_kind be the random variable counting the number of three of a kinds in a hand. It can take values in [0,3] and let Four_of_a_kind be the random variable counting the number of four of a kind in a hand. It can take values in [0,2]. Our new indicator is thus
\[All.Pairs = Pair+Three\_of\_a\_kind+2\cdot Four\_of\_a\_kind\]
Hidden.PAIR<Games %>% mutate(
All.Pairs = Pair + Three_of_a_kind + 2*Four_of_a_kind
) %>% group_by(All.Pairs) %>% summarise(Count=n(),Odds = n()/(4*Simulations))
Hidden.PAIR %>% kable(align="c")
All.Pairs  Count  Odds 

0  2  0.00 
1  53  0.01 
2  561  0.14 
3  1641  0.41 
4  1392  0.35 
5  331  0.08 
6  20  0.00 
As we can see, the difference in probabilities are huge because the pairs are very frequent.
We can also notice that \(All.Pairs=0\) has vanished. Indeed, as there are only 13 ranks and 13 cards in a hand, the probability to have no pairs at all is very small and our simulations did not simulate such a hand. In fact these rare hands have to have all 13 ranks, from A to K. Only the suits are random. We therefore have 4 choices for the first card that is A, then another 4 choices for the second card 2, 4 choices for 3, … 4 choices for K therefore :
\[ \mathbb{P}[PAIR=0] = \dfrac{(C_1^4)^{13}}{C_{13}^{52}} = 0.00011 \] So in other words, the 4 friends will need to play a whopping \(2500\) games in order for someone to stumble upon a hand without a pair. Lastly, we can see that the probabilities are not only shifted upwards.
Pair  Simple  All 

0  0.02  0.00 
1  0.11  0.01 
2  0.26  0.14 
3  0.33  0.41 
4  0.21  0.35 
5  0.06  0.08 
6  0.00  0.00 
So depending on your curiosity you might want to simulate the former or the latter Let’s now go over to the full houses which are once again a combination of a three of a kind and a pair. The simple version is pretty straight forward and is the minimum between the number of pairs and the number of three of a kinds because we need the same number of both of them to make a full house. \[Full\_house = \min(Three\_of\_a\_kind,Pair)\]
Games %>% mutate(
Full_houses = pmin(Pair,Three_of_a_kind)
) %>% group_by(Full_houses) %>% summarise(Count=n(),Odds = n()/(4*Simulations))%>% kable(align="c")
Full_houses  Count  Odds 

0  2311  0.58 
1  1617  0.40 
2  72  0.02 
As expected, there’s a more complicated version that takes into account the three/four of a kinds. For this, I designed a function that works like this :
For a certain hand of 13 cards
 All pairs and three of a kinds are matched up
 If we still have three of a kinds, which means that there are less pairs than three of a kinds, we will look for three and four of a kinds in order the break them down into pairs.
 If at this stage we still have four of a kinds, we will then break them into three of a kinds.
For example, let’s take a look at this hand :
 2 four of a kinds \(\rightarrow\) 8 cards
 1 three of a kind \(\rightarrow\) 3 cards
 1 pair \(\rightarrow\) 2 cards
The pair will be matched up with the three of a kind to make the first full house. The 2 four of a kinds will be broken down into 2 pairs and 1 three of a kind so we can build a full house. This hands has therefore 2 full houses. By the way, the maximum number of full houses in a hand is 2 because we only have 13 cards per hand.
full_house_count<function(Two,Three,Four){
# we need *Three* pairs, so we will need to convert the Three/Four_of_a_kind to pairs if there are not enough pairs
Simple.FH<pmin(Two,Three)
# How many Three do we have
Threes.remaining<pmax(0,ThreeTwo)
# We can break down Threes
Broken.Threes<(Threes.remaining)%/%2 #
Threes.remaining<Threes.remainingBroken.Threes
# Convert Fours into pairs
pairs.converted.3 < pmin(Four*2,Threes.remaining) # we have Four*2 pairs and need Threes.remaining pairs
Three_of_a_kind = Simple.FH+Broken.Threes+pairs.converted.3
pairs.remaining.4 < (2*Fourpairs.converted.3) # We substract the converted pairs from the Four_of_a_kind
Four_of_a_kind<pairs.remaining.4%/%3 # We then divide this by 3 to find the number of remaining Full Houses built on Four_of_a_kind e.g. AA,AA, KK => can be built into one Full House (AAA,KK) and AA,AA,KK,KK,DD,DD => can be built into 2 Full Houses (AAA,DD) and (KKK,DD)
Three_of_a_kind+Four_of_a_kind
}
Games %>% mutate(
All.Full_houses = full_house_count(Pair,Three_of_a_kind,Four_of_a_kind)
) %>% group_by(All.Full_houses) %>% summarise(Count=n(),Odds = n()/(4*Simulations))%>% kable(align="c")
All.Full_houses  Count  Odds 

0  2264  0.57 
1  1659  0.41 
2  77  0.02 
The differences between these two indicators are not massive due to the rareness of the three/four of a kinds.
Results
What interested me most was the probability to get at least one subset of cards. According to these simulations, we can see that
Pairs  1.00 
Three_of_a_kinds  0.47 
Full_houses  0.43 
Four_of_a_kinds  0.03 
Same suit
This section is the same and a lot more easier because there are no three of a kind etc. So, let SUIT be the random variable counting the number of cards with the same suit. SUIT can take values in \([1,13]\). So the lower bound is \(1\) because obviously there has to be a least one suit and the upper bound is \(13\) because every card could be of the same suit. There’s at least one flush when (SUIT > 4), there’s even a rare occurence when the hand has 2 flushes of the same suit (SUIT > 9). I just need to modify my code a little in order take this into account.
one.game<function(deck,shuffle,i){
shuffle %>% deck[.,] %>% mutate(Player=dist) %>% group_by(Player,Suit) %>% summarise(n=n()) %>% group_by(Player) %>%
summarise(
Flush = sum(n>4 & n<10), # only one flush
Flushx2 = sum(n>9), # 2 flushes (most tricky part)
Flushes = Flush+Flushx2*2,
Game=i
)
}
cl<makeCluster(4)
registerDoParallel(cl)
set.seed(1)
Games.suit<foreach(game=1:Simulations,.combine=rbind.data.frame,.packages="tidyverse")%dopar%{
one.game(deck,shuffles[[game]],game)
}
stopCluster(cl)
Games.suit
## # A tibble: 4,000 x 5
## Player Flush Flushx2 Flushes Game
## <fctr> <int> <int> <dbl> <int>
## 1 Alibaba 0 0 0 1
## 2 Baidu 1 0 1 1
## 3 Chinamobile 2 0 2 1
## 4 Didi 1 0 1 1
## 5 Alibaba 0 0 0 2
## 6 Baidu 1 0 1 2
## 7 Chinamobile 1 0 1 2
## 8 Didi 0 0 0 2
## 9 Alibaba 0 0 0 3
## 10 Baidu 0 0 0 3
## # ... with 3,990 more rows
Games.suit %>% group_by(Flushes) %>% summarise(
Odds=n()/(4*Simulations)
) %>% kable()
Flushes  Odds 

0  0.34 
1  0.60 
2  0.06 
The sequences
Now if we want to calculate the odds of a straight, this might be a little more tricky. I designed a function that checks for every unique rank if a sequence can be built. Obviously when the numbers of unique ranks are less than 5, we cannot build a straight. If there are more than 5 cards we can look for the difference by substracting the rank from the next rank, 5 ranks at a time. If there is a set of 5 ranks that meet the condition (difference = 1 for all 5 cards), we consider that there’s at least one straight.
is.seq<function(C){
l<length(C)
if(l<5) return(F)
idx<1:4
(0:(l5)) %>% map(`+`,idx) %>% map(.f=function(x) all((C[x+1]C[x])==1)) %>% unlist %>% any
}
# example
is.seq(1:13)
## [1] TRUE
is.seq(c(1,3,5,6,7,8,10,12))
## [1] FALSE
This function seems to be functional, let’s apply this to the 1000 games.
one.game<function(deck,shuffle,i){
shuffle %>% deck[.,] %>% mutate(Player=dist) %>% group_by(Player,Rank) %>% summarise(n=n()) %>%
group_by(Player) %>% summarise(straight=is.seq(as.integer(Rank)),
Game=i
)
}
cl<makeCluster(4)
registerDoParallel(cl)
Games.straight<foreach(game=1:Simulations,.combine=rbind.data.frame,.packages="tidyverse")%dopar%{one.game(deck,shuffles[[game]],game)}
stopCluster(cl)
Games.straight %>% group_by(Player) %>% summarise(
Straight_prob=sum(straight)/(Simulations)
) %>% kable()
Player  Straight_prob 

Alibaba  0.52 
Baidu  0.56 
Chinamobile  0.51 
Didi  0.53 
Convergence and Final Result
We can now finally see whether our results converge after \(n\) simulations.
So up until 500 simulations, the estimations are very volatile. As you can see after around 1500 simulations our probabilities seem to settle down and converge to some value.
Last but not least, here’s our results in a table :
Subset  P[Subset>0] (estimated) 

Three of a kind : e.g. AAA  0.47 
Straight : e.g. (10,J,Q,K,A)  0.53 
Flush : e.g. ♠♠♠♠♠  0.66 
Full house : e.g. KKK+QQ  0.43 
Four of a kind : e.g. 2222  0.03 
I’d expect that the subset with a higher rank would come less frequently but as these simulations show, Flushes that can beat Straights are somehow relatively more frequent.
But again, these simulations don’t take into account the fact that players mostly don’t want to break full houses or four of a kinds to make a flush.
Thanks for reading, happy new year and I hope that you liked it!
Credits