Analysis on the supplier diversity of J&J’s pharmaceutical companies using Bayesian network


The purpose of this report is to identify the relationships in the supplier base at Johnson & Johnson, especially the business arrangements with minority and women-owned companies. I chose the three pharmaceutical companies’ data, which contains 28 variables of eight departments’ purchase records (n=725,414). To prepare for the data analysis, this paper combined the supplier companies with their branch, by using the ID number. I also separated the data into 36 months according to the variable “PO.Create” to check the purchase records by using months as indexes.

Methods and results

1. Suppliers and purchasing behavior

To gain a basic knowledge of the relationship between the attributes of the suppliers and the purchasing behavior of the pharmaceutical companies, classification on suppliers and the trend of monthly purchase requests would be helpful.


1.1 Purchasing behavior of each supervisor

I analyzed the number of purchases of specific supervisor between 2010 and 2012, to identify the supplier diversity in each department.

Screen Shot 2016-02-14 at 7.07.16 PM.png

Figure 1 is the cumulative frequency plot of a supervisor’s requests, which the requests of purchase indicates would increase at the beginning and the end of each year. In addition, this procedure illustrated that I can extract the data by month.

1.2 Size and Diversity Classifications of suppliers

Since this research mainly focus on the Minority and Women-Owned Business Enterprise (M/WBE) suppliers, I categorized the data by their size and type. Table 1 shows that the proportion of the small, minority and women-owned suppliers is around 5.14% and most the suppliers, around 60% of the all, are large companies, which are not owned by the

minority and women.

Screen Shot 2016-02-14 at 7.07.40 PM.png

Figure 2 and 3 illustrates the plots of each category over time. The Not M/WBE suppliers plot shows a higher proportion around 70% to 80% (the black line in Figure 2) while the proportion of small and M/WBE suppliers are not exceed 40% in each month. This result indicates that the pharmaceutical companies of Johnson & Johnson are prefer using the companies those are large and not owned by women and minority.Screen Shot 2016-02-14 at 7.04.10 PMScreen Shot 2016-02-14 at 7.10.49 PM

1.3 M/WBE suppliers and order type

Screen Shot 2016-02-14 at 7.10.49 PM
Figure 4 the proportions of the suppliers in the five different purchase order types over time. The second five plots indicate, by comparing the five types of order, the suppliers of Canada purchase were not the minority and women-owned companies over time. While for other suppliers, there is no specific pattern of the proportions related to the order types.

2. Internal business and cluster analysis

The purpose of cluster analyzes is to check the similarities and differences in purchasing behavior on the internal business operations. I used the correlation distance, Partitioning Around Medoids (PAM) algorithm, and graphical lasso to group the 11 departments. The scatterplots correlation between the departments. Alza and Janssen Canada have no relationship with another department. By the correlation matrix, I calculated the distances between each department to illustrate the similarities within those eleven variables.

Screen Shot 2016-02-14 at 7.15.37 PM


Then hierarchical cluster analysis was applied based on the distances, and two clusters were selected by Ward’s method (see Figure 5). While the silhouette values of the PAM algorithm also suggests two clusters. Furthermore, by grouping the 11 departments together, the result of the graphical lasso, shows that the precision matrix, which is the basis of Bayesian network.

3. Regression trees and biclustering

After changing the variables into different categories, I searched for the certain pattern on women and minority suppliers in the dataset of specific categories by recursive partition and regression trees. I splitting the data based on the size and the type of the suppliers since my main interest has what influenced the choice of suppliers. By comparing each node of the regression trees, I found the site and the category of the commodity are the two factors of the data partitions.

Screen Shot 2016-02-14 at 7.15.55 PM

To identify the relationship between these two factors, I applied biclustering in the next step. Biclustering groups the samples based on the location of each purchase and the types of behavior that are related. The result is a set of 4 biclusters. Figure 6 is the four heatmaps of each bicluster, and the second bicluster used most of the data by comparing with the others. While the heatmap of all the clusters shows almost 90% of the columns and rows in the dataset are grouped by the four biclusters. (see Figure 7). The relationship between the site and the category of the commodity do exist in the dataset. Additionally, based on the results, it is meaningful by extracting and analyzing this association deeply with Bayesian network.

4. Effects on supplier diversity

4.1 Generalized Linear Models on the size of suppliers

I used the general linear model to predict the probability that a women and minority enterprise is chosen as a supplier based on its size, the site, the category and the total amount of the order. After recoding the level category level of 8 and using log(x+1) transformation on the total amount data, I conducted two models.

Screen Shot 2016-02-14 at 7.16.04 PM

The first model is much better by comparing the AICs of two models since it considered the interaction effects of the site, the category and the type of orders. The models show that the small company suppliers are more likely to be owned by women and minority, while most the combinations of the site and categories have a slightly effect. In addition, the larger the total amount of orders, the more likely not M/WBE suppliers are chosen. However, general linear models couldn’t provide all the information since it only predict the linear relationship and only focus on the size of suppliers. So I applied Bayesian network in the next section.

4.2 Bayesian network

To check the relationships among the site, the size and the type of each supplier, the category, the type and the total amount of the purchase order, I constructed Bayesian networks on thirty-six months’ data. By analyzing how often each relationship would happen and calculate the percentage of their occurrences, I generated a table. In Table 2, there are 15 relationships.

Screen Shot 2016-02-14 at 7.16.19 PM

The elements of the diagonal cells are the frequency of their occurrences while the percentages are the elements under the diagonal. The results illustrate the M/WBE suppliers are highly correlated with the category of the commodity and the type of the order while the size of the suppliers has a strong connection with the type of the order. In addition, they both have no association with the total amounts of the orders.

Screen Shot 2016-02-14 at 7.16.11 PM



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s