Dr Scott Oatley

Before any analysis. Both the R kernal and Stata kernal need to be properly setup. First start with R. We first need to install the correct Python to R package using pip.

pip install rpy2

Requirement already satisfied: rpy2 in /opt/anaconda3/lib/python3.12/site-packages (3.5.17)
Requirement already satisfied: cffi>=1.15.1 in /opt/anaconda3/lib/python3.12/site-packages (from rpy2) (1.17.1)
Requirement already satisfied: jinja2 in /opt/anaconda3/lib/python3.12/site-packages (from rpy2) (3.1.4)
Requirement already satisfied: tzlocal in /opt/anaconda3/lib/python3.12/site-packages (from rpy2) (5.3.1)
Requirement already satisfied: pycparser in /opt/anaconda3/lib/python3.12/site-packages (from cffi>=1.15.1->rpy2) (2.21)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/lib/python3.12/site-packages (from jinja2->rpy2) (2.1.3)
Note: you may need to restart the kernel to use updated packages.

After we have done this we then need to setup R within the Python kernal

%load_ext rpy2.ipython

Finally a quick little test to make sure R now works. Anytime we want R code to run we have to use the magic cells %%R.

%%R
x <- c(1,2,3,4,5)
mean(x)

[1] 3

Now we have to install the python to stata dependecies to get Stata to work in a Python kernal. We start by installing the pystata package.

pip install pystata

Requirement already satisfied: pystata in /opt/anaconda3/lib/python3.12/site-packages (0.0.1)
Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.12/site-packages (from pystata) (2.2.2)
Requirement already satisfied: numpy>=1.26.0 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->pystata) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->pystata) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->pystata) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->pystata) (2023.3)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->pystata) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

We then also install the stata_setup package.

pip install --upgrade --user stata_setup

Requirement already satisfied: stata_setup in /Users/scottoatley/.local/lib/python3.12/site-packages (0.1.3)
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.12/site-packages (from stata_setup) (1.26.4)
Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.12/site-packages (from stata_setup) (2.2.2)
Requirement already satisfied: ipython in /opt/anaconda3/lib/python3.12/site-packages (from stata_setup) (8.27.0)
Requirement already satisfied: decorator in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (0.19.1)
Requirement already satisfied: matplotlib-inline in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (0.1.6)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (3.0.43)
Requirement already satisfied: pygments>=2.4.0 in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (2.15.1)
Requirement already satisfied: stack-data in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (0.2.0)
Requirement already satisfied: traitlets>=5.13.0 in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (5.14.3)
Requirement already satisfied: pexpect>4.3 in /opt/anaconda3/lib/python3.12/site-packages (from ipython->stata_setup) (4.8.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->stata_setup) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->stata_setup) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->stata_setup) (2023.3)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /opt/anaconda3/lib/python3.12/site-packages (from jedi>=0.16->ipython->stata_setup) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /opt/anaconda3/lib/python3.12/site-packages (from pexpect>4.3->ipython->stata_setup) (0.7.0)
Requirement already satisfied: wcwidth in /opt/anaconda3/lib/python3.12/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython->stata_setup) (0.2.5)
Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->stata_setup) (1.16.0)
Requirement already satisfied: executing in /opt/anaconda3/lib/python3.12/site-packages (from stack-data->ipython->stata_setup) (0.8.3)
Requirement already satisfied: asttokens in /opt/anaconda3/lib/python3.12/site-packages (from stack-data->ipython->stata_setup) (2.0.5)
Requirement already satisfied: pure-eval in /opt/anaconda3/lib/python3.12/site-packages (from stack-data->ipython->stata_setup) (0.2.2)
Note: you may need to restart the kernel to use updated packages.

We now import that stata_setup package into the python kernal.

import stata_setup

And finally we import Stata itself into the kernal.

stata_setup.config("/Applications/Stata", "se")


  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      18.0
___/   /   /___/   /   /___/       SE—Standard Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: Unlimited-user network, expiring 14 Sep 2025
Serial number: 401809305318
  Licensed to: Scott oatley
               University of Edinburgh

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

And we now test that Stata works by doing a simple test using display.

%%stata

display 1+1


. 
. display 1+1
2

.

And now back to R to make sure they both can run at the same time.

%%R
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T)

[1] FALSE FALSE FALSE  TRUE

Each handling missing data method will now be explored using the four aforementioned software programs: R, Stata, Python, and Mplus. Some methods are not possible with certain software, when this happens, only relevant software will be mentioned. We start with a simple simulation of an OLS linear regression model with three continuous independent variables that are normally distributed with means ranging from 40-200 and standard deviations ranging from 5-50. First we start with R, then Stata, then Python, and finally Mplus.

All methods will be produced.

%%R

# Load necessary library
set.seed(123)  # Ensure reproducibility

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Create data frame
data <- data.frame(y, x1, x2, x3)

# Run regression
model <- lm(y ~ x1 + x2 + x3, data = data)
summary(model)


Call:
lm(formula = y ~ x1 + x2 + x3, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-4584.3  -950.2     0.7  1012.9  4338.4 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2726.9512  1462.4421   1.865   0.0625 .  
x1            29.8375     3.9740   7.508 1.33e-13 ***
x2            39.7564     0.9369  42.432  < 2e-16 ***
x3            65.4442     9.6348   6.792 1.89e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1489 on 996 degrees of freedom
Multiple R-squared:  0.6662,    Adjusted R-squared:  0.6652 
F-statistic: 662.5 on 3 and 996 DF,  p-value: < 2.2e-16

This is the same as this in Stata:

%%stata

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

* Run Model
regress y x1 x2 x3

etable

twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit") legend(off)


. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. * Run Model
. regress y x1 x2 x3

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =    714.04
       Model |  4.7321e+09         3  1.5774e+09   Prob > F        =    0.0000
    Residual |  2.2002e+09       996  2209043.89   R-squared       =    0.6826
-------------+----------------------------------   Adj R-squared   =    0.6817
       Total |  6.9323e+09       999  6939202.43   Root MSE        =    1486.3

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |    29.4092   4.032441     7.29   0.000     21.49614    37.32225
          x2 |   42.71715   .9441681    45.24   0.000     40.86437    44.56994
          x3 |   54.51955   9.245538     5.90   0.000     36.37658    72.66252
       _cons |   3778.896   1402.667     2.69   0.007     1026.375    6531.417
------------------------------------------------------------------------------

. 
. etable

---------------------------------
                            y    
---------------------------------
x1                         29.409
                          (4.032)
x2                         42.717
                          (0.944)
x3                         54.520
                          (9.246)
Intercept                3778.896
                       (1402.667)
Number of observations       1000
---------------------------------

. 
. twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit"
> ) legend(off)

.

Now don’t worry too much that the results from the models aren’t exactly perfect. That is because we have different seeds for each software. Substantively they are the same and that is all that matters here. Now for Python.

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Set seed for reproducibility
np.random.seed(123)

# Generate metric independent variables
x1 = np.random.normal(loc=40, scale=12, size=1000)
x2 = np.random.normal(loc=200, scale=50, size=1000)
x3 = np.random.normal(loc=150, scale=5, size=1000)

# Generate metric dependent variable
y = 30*x1 + 40*x2 + 50*x3 + np.random.normal(loc=5000, scale=1500, size=1000)

# Create DataFrame
df = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2, 'x3': x3})

# Run regression
X = df[['x1', 'x2', 'x3']]  # Independent variables
X = sm.add_constant(X)  # Add intercept
model = sm.OLS(df['y'], X).fit()  # Fit model

# Display summary
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.618
Model:                            OLS   Adj. R-squared:                  0.617
Method:                 Least Squares   F-statistic:                     537.7
Date:                Sun, 06 Apr 2025   Prob (F-statistic):          1.03e-207
Time:                        13:35:34   Log-Likelihood:                -8719.9
No. Observations:                1000   AIC:                         1.745e+04
Df Residuals:                     996   BIC:                         1.747e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5257.2104   1467.373      3.583      0.000    2377.713    8136.707
x1            26.4715      3.914      6.764      0.000      18.792      34.151
x2            38.7536      0.981     39.513      0.000      36.829      40.678
x3            51.5446      9.574      5.384      0.000      32.757      70.332
==============================================================================
Omnibus:                        2.499   Durbin-Watson:                   1.897
Prob(Omnibus):                  0.287   Jarque-Bera (JB):                2.519
Skew:                           0.122   Prob(JB):                        0.284
Kurtosis:                       2.964   Cond. No.                     8.01e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.01e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

And finally, as Mplus as written through R. Currently, no access to full MPlus licence which does cause some issues with getting MPlus to run properly. For now this is all the MPlus I can produce.

%%R

install.packages("MplusAutomation")
install.packages("texreg")

--- Please select a CRAN mirror for use in this session ---
Secure CRAN mirrors 

 1: 0-Cloud [https]
 2: Australia (Canberra) [https]
 3: Australia (Melbourne 1) [https]
 4: Australia (Melbourne 2) [https]
 5: Austria (Wien 1) [https]
 6: Belgium (Brussels) [https]
 7: Brazil (PR) [https]
 8: Brazil (SP 1) [https]
 9: Brazil (SP 2) [https]
10: Bulgaria [https]
11: Canada (ON 1) [https]
12: Canada (ON 2) [https]
13: Chile (Santiago) [https]
14: China (Beijing 1) [https]
15: China (Beijing 2) [https]
16: China (Beijing 3) [https]
17: China (Hefei) [https]
18: China (Hong Kong) [https]
19: China (Jinan) [https]
20: China (Lanzhou) [https]
21: China (Nanjing) [https]
22: China (Shanghai 2) [https]
23: China (Shenzhen) [https]
24: China (Wuhan) [https]
25: Colombia (Cali) [https]
26: Costa Rica [https]
27: Cyprus [https]
28: Czech Republic [https]
29: Denmark [https]
30: East Asia [https]
31: Ecuador (Cuenca) [https]
32: Finland (Helsinki) [https]
33: France (Lyon 1) [https]
34: France (Lyon 2) [https]
35: France (Paris 1) [https]
36: Germany (Erlangen) [https]
37: Germany (Göttingen) [https]
38: Germany (Leipzig) [https]
39: Germany (Münster) [https]
40: Greece [https]
41: Hungary [https]
42: Iceland [https]
43: India (Bengaluru) [https]
44: India (Bhubaneswar) [https]
45: Indonesia (Banda Aceh) [https]
46: Iran (Mashhad) [https]
47: Italy (Milano) [https]
48: Italy (Padua) [https]
49: Japan (Yonezawa) [https]
50: Korea (Gyeongsan-si) [https]
51: Mexico (Mexico City) [https]
52: Mexico (Texcoco) [https]
53: Morocco [https]
54: Netherlands (Dronten) [https]
55: New Zealand [https]
56: Norway [https]
57: Poland [https]
58: South Africa (Johannesburg) [https]
59: Spain (A Coruña) [https]
60: Spain (Madrid) [https]
61: Sweden (Umeå) [https]
62: Switzerland (Zurich 1) [https]
63: Taiwan (Taipei) [https]
64: Turkey (Denizli) [https]
65: UK (Bristol) [https]
66: UK (London 1) [https]
67: USA (IA) [https]
68: USA (MI) [https]
69: USA (MO) [https]
70: USA (OH) [https]
71: USA (OR) [https]
72: USA (PA 1) [https]
73: USA (TN) [https]
74: USA (UT) [https]
75: United Arab Emirates [https]
76: Uruguay [https]
77: (other mirrors)


The downloaded binary packages are in
    /var/folders/5m/wtp81d3x2vx4v3kw6v37qj180000gn/T//RtmprBoE5j/downloaded_packages

The downloaded binary packages are in
    /var/folders/5m/wtp81d3x2vx4v3kw6v37qj180000gn/T//RtmprBoE5j/downloaded_packages

Selection:  67

trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/big-sur-arm64/contrib/4.4/MplusAutomation_1.1.1.tgz'
Content type 'application/x-gzip' length 2769853 bytes (2.6 MB)
==================================================
downloaded 2.6 MB

trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/big-sur-arm64/contrib/4.4/texreg_1.39.4.tgz'
Content type 'application/x-gzip' length 1340319 bytes (1.3 MB)
==================================================
downloaded 1.3 MB

In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
  dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
  Referenced from: <31EADEB5-0A17-3546-9944-9B3747071FE8> /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/modules/R_X11.so
  Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/X11/lib/libSM.6.dylib' (no such file), '/opt/X11/lib/libSM.6.dylib' (no such file), '/usr/local/lib/libSM.6.dylib' (no such file), '/usr/lib/libSM.6.dylib' (no such file, not in dyld cache)

%%R

# Load necessary libraries
library(MplusAutomation)
library(texreg)

# Define the OLS linear regression model
olsModel <- mplusObject(
  TITLE = "OLS Linear Regression Model in Mplus;",
  MODEL = "
     mpg ON hp wt;",  # mpg as dependent variable, hp and wt as independent variables
  OUTPUT = "CINTERVAL;",  # Request confidence intervals for estimates
  rdata = mtcars  # Use the mtcars dataset
)

# Run the model
fit_ols <- mplusModeler(olsModel, modelout = "ols_model.inp", run = 1L)

# Display fit statistics
screenreg(fit_ols, summaries = c("Observations", "CFI", "SRMR"), single.row = TRUE)


==================================
                  TITLE           
----------------------------------
 MPG<-HP          -0.03 (0.01) ***
 MPG<-WT          -3.88 (0.60) ***
 MPG<-Intercepts  37.23 (1.52) ***
 MPG<->MPG         6.10 (1.52) ***
----------------------------------
Observations      32              
CFI                1.00           
SRMR               0.00           
==================================
*** p < 0.001; ** p < 0.01; * p < 0.05
 FALSE

Version:  1.1.1
We work hard to write this free software. Please help us get credit by citing: 

Hallquist, M. N. & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural Equation Modeling, 25, 621-638. doi: 10.1080/10705511.2017.1402334.

-- see citation("MplusAutomation").
Version:  1.39.4
Date:     2024-07-23
Author:   Philip Leifeld (University of Manchester)

Consider submitting praise using the praise or praise_interactive functions.
Please cite the JSS article in your publications -- see citation("texreg").

The simple OLS linear regression model has now been conducted. We now perform the same model under a sem framework.

%%stata

clear

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

sem (y <- x1 x2 x3), method(mlmv)

etable, append


. 
. clear

. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. sem (y <- x1 x2 x3), method(mlmv)

Endogenous variables
  Observed: y

Exogenous variables
  Observed: x1 x2 x3

Fitting target model:
Iteration 0:  Log likelihood = -20967.462  
Iteration 1:  Log likelihood = -20967.462  

Structural equation model                                Number of obs = 1,000
Estimation method: mlmv

Log likelihood = -20967.462

------------------------------------------------------------------------------
             |                 OIM
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Structural   |
  y          |
          x1 |    29.4092   4.024368     7.31   0.000     21.52158    37.29681
          x2 |   42.71715   .9422779    45.33   0.000     40.87032    44.56398
          x3 |   54.51955   9.227029     5.91   0.000     36.43491     72.6042
       _cons |   3778.896   1399.858     2.70   0.007     1035.223    6522.568
-------------+----------------------------------------------------------------
     var(e.y)|    2200208   98396.28                       2015565     2401765
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0) = 0.00                 Prob > chi2 = .

. 
. etable, append 

---------------------------------------------
                            y          2     
---------------------------------------------
x1                         29.409      29.409
                          (4.032)     (4.024)
x2                         42.717      42.717
                          (0.944)     (0.942)
x3                         54.520      54.520
                          (9.246)     (9.227)
Intercept                3778.896    3778.896
                       (1402.667)  (1399.858)
var(e.y)                             2.20e+06
                                  (98396.280)
Number of observations       1000        1000
---------------------------------------------

.

%%R

# Load necessary libraries
library(lavaan)
library(MASS)  # For mvrnorm (multivariate normal distribution)

# Set seed for reproducibility
set.seed(123)

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Create a dataframe
data <- data.frame(y, x1, x2, x3)

# Define the SEM model
sem_model <- '
  y ~ x1 + x2 + x3
'

# Fit the model using Maximum Likelihood with Missing Values (MLMV)
fit <- sem(sem_model, data = data, missing = "ml")

# Print model summary
summary(fit, fit.measures = TRUE, standardized = TRUE)

lavaan 0.6-18 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                          1000
  Number of missing patterns                         1

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                              1097.097
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000
                                                      
  Robust Comparative Fit Index (CFI)             1.000
  Robust Tucker-Lewis Index (TLI)                1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -8722.665
  Loglikelihood unrestricted model (H1)      -8722.665
                                                      
  Akaike (AIC)                               17455.330
  Bayesian (BIC)                             17479.869
  Sample-size adjusted Bayesian (SABIC)      17463.989

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: RMSEA <= 0.050                       NA
  P-value H_0: RMSEA >= 0.080                       NA
                                                      
  Robust RMSEA                                   0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: Robust RMSEA <= 0.050                NA
  P-value H_0: Robust RMSEA >= 0.080                NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Regressions:
                   Estimate     Std.Err   z-value  P(>|z|)   Std.lv     Std.all
  y ~                                                                          
    x1                  29.837     3.966    7.523    0.000      29.837    0.138
    x2                  39.756     0.935   42.518    0.000      39.756    0.780
    x3                  65.444     9.616    6.806    0.000      65.444    0.124

Intercepts:
                   Estimate     Std.Err   z-value  P(>|z|)   Std.lv     Std.all
   .y                 2726.951  1459.514    1.868    0.062    2726.951    1.060

Variances:
                   Estimate     Std.Err   z-value  P(>|z|)   Std.lv     Std.all
   .y              2207681.041 98745.450   22.357    0.000 2207681.041    0.334

This is lavaan 0.6-18
lavaan is FREE software! Please report any bugs.
In addition: Warning messages:
1: lavaan->lav_data_full():  
   some observed variances are larger than 1000000 use varTable(fit) to 
   investigate 
2: lavaan->lavData():  
   some observed variances in the sample covariance matrix are larger than 
   1000000. 
3: lavaan->lavData():  
   some observed variances in the sample covariance matrix are larger than 
   1000000. 
4: lavaan->lavData():  
   some observed variances in the sample covariance matrix are larger than 
   1000000. 
5: lavaan->lavData():  
   some observed variances in the sample covariance matrix are larger than 
   1000000.

pip install semopy

Requirement already satisfied: semopy in /opt/anaconda3/lib/python3.12/site-packages (2.3.11)
Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (1.13.1)
Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (1.26.4)
Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (2.2.2)
Requirement already satisfied: sympy in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (1.13.2)
Requirement already satisfied: scikit-learn in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (1.5.1)
Requirement already satisfied: statsmodels in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (0.14.2)
Requirement already satisfied: numdifftools in /opt/anaconda3/lib/python3.12/site-packages (from semopy) (0.9.41)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->semopy) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->semopy) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/lib/python3.12/site-packages (from pandas->semopy) (2023.3)
Requirement already satisfied: joblib>=1.2.0 in /opt/anaconda3/lib/python3.12/site-packages (from scikit-learn->semopy) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/anaconda3/lib/python3.12/site-packages (from scikit-learn->semopy) (3.5.0)
Requirement already satisfied: patsy>=0.5.6 in /opt/anaconda3/lib/python3.12/site-packages (from statsmodels->semopy) (0.5.6)
Requirement already satisfied: packaging>=21.3 in /opt/anaconda3/lib/python3.12/site-packages (from statsmodels->semopy) (24.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/anaconda3/lib/python3.12/site-packages (from sympy->semopy) (1.3.0)
Requirement already satisfied: six in /opt/anaconda3/lib/python3.12/site-packages (from patsy>=0.5.6->statsmodels->semopy) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

import numpy as np
import pandas as pd
from semopy import Model

# Set random seed for reproducibility
np.random.seed(123)

# Generate metric independent variables
x1 = np.random.normal(loc=40, scale=12, size=1000)
x2 = np.random.normal(loc=200, scale=50, size=1000)
x3 = np.random.normal(loc=150, scale=5, size=1000)

# Generate metric dependent variable
y = 30*x1 + 40*x2 + 50*x3 + np.random.normal(loc=5000, scale=1500, size=1000)

# Create a DataFrame
data = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2, 'x3': x3})

# Define the SEM model in semopy syntax
sem_model = """
y ~ x1 + x2 + x3
"""

# Create and fit the model
model = Model(sem_model)
model.fit(data)

# Print model results
print(model.inspect())

  lval  op rval      Estimate       Std. Err    z-value       p-value
0    y   ~   x1  2.826314e+01       4.470083   6.322734  2.569751e-10
1    y   ~   x2  3.870783e+01       1.120236  34.553275  0.000000e+00
2    y   ~   x3  9.507102e+00      10.935110   0.869411  3.846225e-01
3    y  ~~    y  2.875963e+06  128616.985180  22.360680  0.000000e+00

%%R

# Load necessary libraries
library(MplusAutomation)
library(texreg)

# Define the SEM model
semModel <- mplusObject(
  TITLE = "Structural Equation Model (SEM) in Mplus;",
  MODEL = "
     mpg ON hp wt;",  # Regression model in SEM form
  OUTPUT = "STANDARDIZED CINTERVAL;",  # Request standardized results and confidence intervals
  rdata = mtcars  # Use the mtcars dataset
)

# Run the SEM model in Mplus
fit_sem <- mplusModeler(
  semModel, 
  modelout = "sem_model.inp",  # Output .inp file
  run = 1L                     # Run the model after writing input
)

# Display model fit statistics
screenreg(fit_sem, summaries = c("Observations", "CFI", "SRMR"), single.row = TRUE)


==================================
                  TITLE           
----------------------------------
 MPG<-HP          -0.03 (0.01) ***
 MPG<-WT          -3.88 (0.60) ***
 MPG<-Intercepts  37.23 (1.52) ***
 MPG<->MPG         6.10 (1.52) ***
----------------------------------
Observations      32              
CFI                1.00           
SRMR               0.00           
==================================
*** p < 0.001; ** p < 0.01; * p < 0.05
 FALSE

Now we move on to a simple MCAR model. Starting with Stata.

%%stata

clear

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

* Generate MCAR 
gen rmcar = rbinomial(1, 0.5)  // MCAR: 50% chance of missingness (binary random)
replace x2 = . if rmcar == 0  // Set x to missing where rmcar == 0

regress y x1 x2 x3

etable, append

twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit") legend(off)


. 
. clear

. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. * Generate MCAR 
. gen rmcar = rbinomial(1, 0.5)  // MCAR: 50% chance of missingness (binary ran
> dom)

. replace x2 = . if rmcar == 0  // Set x to missing where rmcar == 0
(492 real changes made, 492 to missing)

. 
. regress y x1 x2 x3

      Source |       SS           df       MS      Number of obs   =       508
-------------+----------------------------------   F(3, 504)       =    325.45
       Model |  2.3023e+09         3   767444127   Prob > F        =    0.0000
    Residual |  1.1885e+09       504  2358099.32   R-squared       =    0.6595
-------------+----------------------------------   Adj R-squared   =    0.6575
       Total |  3.4908e+09       507  6885235.58   Root MSE        =    1535.6

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   25.21384    5.94201     4.24   0.000     13.53968      36.888
          x2 |   42.55748   1.384992    30.73   0.000     39.83641    45.27855
          x3 |   48.17947   13.13299     3.67   0.000     22.37733    73.98162
       _cons |   4912.711   2007.467     2.45   0.015     968.6771    8856.744
------------------------------------------------------------------------------

. 
. etable, append

--------------------------------------------------------
                            y          2           y    
--------------------------------------------------------
x1                         29.409      29.409     25.214
                          (4.032)     (4.024)    (5.942)
x2                         42.717      42.717     42.557
                          (0.944)     (0.942)    (1.385)
x3                         54.520      54.520     48.179
                          (9.246)     (9.227)   (13.133)
Intercept                3778.896    3778.896   4912.711
                       (1402.667)  (1399.858) (2007.467)
var(e.y)                             2.20e+06           
                                  (98396.280)           
Number of observations       1000        1000        508
--------------------------------------------------------

. 
. twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit"
> ) legend(off)

. 
.

%%R

# Set seed for reproducibility
set.seed(123)

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Introduce 50% Missing Completely at Random (MCAR) in x2
rmcar <- rbinom(1000, 1, 0.5)  # Binary indicator: 50% missingness
x2[rmcar == 0] <- NA        # Set x2 to NA for half the observations

# Create data frame
data <- data.frame(y, x1, x2, x3)

# Run OLS regression (handling missing data with listwise deletion by default)
ols_model <- lm(y ~ x1 + x2 + x3, data = data)

# Display regression results
summary(ols_model)


Call:
lm(formula = y ~ x1 + x2 + x3, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-4464.9 -1057.0   -31.6  1042.2  4294.4 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  386.332   2177.578   0.177    0.859    
x1            29.499      5.908   4.993 8.28e-07 ***
x2            40.483      1.364  29.671  < 2e-16 ***
x3            80.047     14.416   5.553 4.62e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1548 on 489 degrees of freedom
  (507 observations deleted due to missingness)
Multiple R-squared:  0.6697,    Adjusted R-squared:  0.6677 
F-statistic: 330.5 on 3 and 489 DF,  p-value: < 2.2e-16

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Set seed for reproducibility
np.random.seed(123)

# Set number of observations
n = 1000

# Generate independent variables
x1 = np.random.normal(loc=40, scale=12, size=n)
x2 = np.random.normal(loc=200, scale=50, size=n)
x3 = np.random.normal(loc=150, scale=5, size=n)

# Generate dependent variable
y = 30*x1 + 40*x2 + 50*x3 + np.random.normal(loc=5000, scale=1500, size=n)

# Introduce 50% Missing Completely at Random (MCAR) in x2
rmcar = np.random.binomial(1, 0.5, size=n)  # Binary indicator: 50% missingness
x2[rmcar == 0] = np.nan  # Set x2 to NaN for half the observations

# Create DataFrame
data = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2, 'x3': x3})

# Drop rows with missing values (listwise deletion) and run OLS regression
data_complete = data.dropna()  # Remove rows with NaN
X = data_complete[['x1', 'x2', 'x3']]
X = sm.add_constant(X)  # Add intercept
y = data_complete['y']

# Fit OLS model
model = sm.OLS(y, X).fit()

# Display results
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.631
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     295.0
Date:                Sun, 06 Apr 2025   Prob (F-statistic):          1.19e-111
Time:                        13:35:55   Log-Likelihood:                -4541.7
No. Observations:                 522   AIC:                             9091.
Df Residuals:                     518   BIC:                             9108.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       4652.1226   2050.412      2.269      0.024     623.978    8680.268
x1            20.5814      5.133      4.010      0.000      10.498      30.665
x2            39.7328      1.342     29.605      0.000      37.096      42.369
x3            55.6552     13.271      4.194      0.000      29.584      81.726
==============================================================================
Omnibus:                        4.773   Durbin-Watson:                   1.958
Prob(Omnibus):                  0.092   Jarque-Bera (JB):                4.593
Skew:                           0.223   Prob(JB):                        0.101
Kurtosis:                       3.108   Cond. No.                     8.24e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.24e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Now we move on to a MAR model. First starting with Stata.

%%stata

clear

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

* Generate MAR
gen prob_mar = logistic(y-21791)
gen rmar = 0 if prob_mar==0
replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
regress y x1 x2 x3

etable, append

twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit") legend(off)


. 
. clear

. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. * Generate MAR
. gen prob_mar = logistic(y-21791)

. gen rmar = 0 if prob_mar==0
(500 missing values generated)

. replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
(500 real changes made, 500 to missing)

. regress y x1 x2 x3

      Source |       SS           df       MS      Number of obs   =       500
-------------+----------------------------------   F(3, 496)       =    122.89
       Model |   552925039         3   184308346   Prob > F        =    0.0000
    Residual |   743864709       496  1499727.24   R-squared       =    0.4264
-------------+----------------------------------   Adj R-squared   =    0.4229
       Total |  1.2968e+09       499  2598777.05   Root MSE        =    1224.6

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   20.14886   4.982177     4.04   0.000     10.36009    29.93763
          x2 |   26.96303   1.407434    19.16   0.000     24.19777     29.7283
          x3 |    19.7006   10.99788     1.79   0.074    -1.907582    41.30878
       _cons |   13742.28   1717.077     8.00   0.000     10368.64    17115.92
------------------------------------------------------------------------------

. 
. etable, append

-------------------------------------------------------------------
                            y          2           y          y    
-------------------------------------------------------------------
x1                         29.409      29.409     25.214     20.149
                          (4.032)     (4.024)    (5.942)    (4.982)
x2                         42.717      42.717     42.557     26.963
                          (0.944)     (0.942)    (1.385)    (1.407)
x3                         54.520      54.520     48.179     19.701
                          (9.246)     (9.227)   (13.133)   (10.998)
Intercept                3778.896    3778.896   4912.711  13742.276
                       (1402.667)  (1399.858) (2007.467) (1717.077)
var(e.y)                             2.20e+06                      
                                  (98396.280)                      
Number of observations       1000        1000        508        500
-------------------------------------------------------------------

. 
. twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit"
> ) legend(off)

.

%%R

# Load necessary library
set.seed(123)  # Ensure reproducibility

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Generate MAR (Missing At Random) pattern
prob_mar <- 1 / (1 + exp(-(y - 21791)))  # Logistic function
rmar <- rbinom(1000, 1, prob_mar)  # Generate missing indicator based on probability
x2[rmar == 0] <- NA  # Set x2 to NA for MAR missingness

# Create data frame
data <- data.frame(y, x1, x2, x3)

# Run OLS regression (listwise deletion by default)
ols_model <- lm(y ~ x1 + x2 + x3, data = data, na.action = na.omit)

# Display regression results
summary(ols_model)


Call:
lm(formula = y ~ x1 + x2 + x3, data = data, na.action = na.omit)

Residuals:
    Min      1Q  Median      3Q     Max 
-2876.0  -828.4   -78.3   670.2  3761.0 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12209.232   1598.736   7.637 1.13e-13 ***
x1             19.693      4.588   4.292 2.12e-05 ***
x2             24.731      1.351  18.300  < 2e-16 ***
x3             33.021     10.191   3.240  0.00127 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1161 on 504 degrees of freedom
  (492 observations deleted due to missingness)
Multiple R-squared:  0.4173,    Adjusted R-squared:  0.4138 
F-statistic: 120.3 on 3 and 504 DF,  p-value: < 2.2e-16

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Set seed for reproducibility
np.random.seed(123)

# Set number of observations
n = 1000

# Generate independent variables
x1 = np.random.normal(40, 12, n)
x2 = np.random.normal(200, 50, n)
x3 = np.random.normal(150, 5, n)

# Generate dependent variable
y = 3*x1 + 4*x2 + 4*x3 + np.random.normal(500, 150, n)

# Generate MAR (Missing At Random) for x2

# Logistic function with scaled y values
prob_mar = 1 / (1 + np.exp(y-2050))  # Apply logistic function
rmar = np.random.binomial(1, prob_mar)  # Generate MAR indicator
x2[rmar == 0] = np.nan  # Set x2 to missing

# Create DataFrame
data = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2, 'x3': x3})

# Run OLS regression (listwise deletion by default)
data_clean = data.dropna()  # Remove rows with missing values
X = sm.add_constant(data_clean[['x1', 'x2', 'x3']])  # Add intercept
model = sm.OLS(data_clean['y'], X).fit()

# Display regression results
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.398
Model:                            OLS   Adj. R-squared:                  0.395
Method:                 Least Squares   F-statistic:                     116.1
Date:                Sun, 06 Apr 2025   Prob (F-statistic):           9.95e-58
Time:                        13:35:55   Log-Likelihood:                -3262.9
No. Observations:                 531   AIC:                             6534.
Df Residuals:                     527   BIC:                             6551.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        872.2379    153.985      5.664      0.000     569.737    1174.738
x1             1.9156      0.407      4.708      0.000       1.116       2.715
x2             2.3273      0.128     18.251      0.000       2.077       2.578
x3             3.3532      1.005      3.338      0.001       1.380       5.327
==============================================================================
Omnibus:                        7.685   Durbin-Watson:                   1.802
Prob(Omnibus):                  0.021   Jarque-Bera (JB):                7.560
Skew:                          -0.275   Prob(JB):                       0.0228
Kurtosis:                       3.201   Cond. No.                     7.31e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.31e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Now we have our baseline MAR model for each software, now we can focus on implementing potential handling missing data solutions. Starting with Stata and Single mean imputation.

%%stata

clear

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

* Generate MAR
gen prob_mar = logistic(y-21791)
gen rmar = 0 if prob_mar==0
replace x2 = . if rmar == 0  // Set x to missing where rmar == 0

summarize x2, meanonly
replace x2 = r(mean) if missing(x2)

regress y x1 x2 x3

etable, append

twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit") legend(off)


. 
. clear

. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. * Generate MAR
. gen prob_mar = logistic(y-21791)

. gen rmar = 0 if prob_mar==0
(500 missing values generated)

. replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
(500 real changes made, 500 to missing)

. 
. summarize x2, meanonly

. replace x2 = r(mean) if missing(x2)
(500 real changes made)

. 
. regress y x1 x2 x3

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =     44.23
       Model |   814884957         3   271628319   Prob > F        =    0.0000
    Residual |  6.1174e+09       996  6141946.06   R-squared       =    0.1175
-------------+----------------------------------   Adj R-squared   =    0.1149
       Total |  6.9323e+09       999  6939202.43   Root MSE        =    2478.3

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   36.86274   6.772257     5.44   0.000     23.57321    50.15227
          x2 |   28.00538    2.82263     9.92   0.000      22.4664    33.54436
          x3 |    63.7622   15.41944     4.14   0.000     33.50389    94.02052
       _cons |   4173.834   2437.463     1.71   0.087    -609.3184    8956.987
------------------------------------------------------------------------------

. 
. etable, append

------------------------------------------------------------------------------
                            y          2           y          y          y    
------------------------------------------------------------------------------
x1                         29.409      29.409     25.214     20.149     36.863
                          (4.032)     (4.024)    (5.942)    (4.982)    (6.772)
x2                         42.717      42.717     42.557     26.963     28.005
                          (0.944)     (0.942)    (1.385)    (1.407)    (2.823)
x3                         54.520      54.520     48.179     19.701     63.762
                          (9.246)     (9.227)   (13.133)   (10.998)   (15.419)
Intercept                3778.896    3778.896   4912.711  13742.276   4173.834
                       (1402.667)  (1399.858) (2007.467) (1717.077) (2437.463)
var(e.y)                             2.20e+06                                 
                                  (98396.280)                                 
Number of observations       1000        1000        508        500       1000
------------------------------------------------------------------------------

. 
. twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit"
> ) legend(off)

.

%%R

# Set seed for reproducibility
set.seed(123)

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Generate MAR (Missing At Random) for x2 using logistic function
prob_mar <- 1 / (1 + exp(-(y - 21791)))  # Logistic function
rmar <- rbinom(1000, 1, prob_mar)  # Generate MAR indicator
x2[rmar == 0] <- NA  # Set x2 to missing where rmar == 0

# Display the summary statistics for x2 (mean only)
mean_x2 <- mean(x2, na.rm = TRUE)

# Replace missing values of x2 with the mean of x2
x2[is.na(x2)] <- mean_x2

# Create a data frame with the variables
data <- data.frame(y, x1, x2, x3)

# Run OLS regression
model <- lm(y ~ x1 + x2 + x3, data = data)

# Display the regression results
summary(model)


Call:
lm(formula = y ~ x1 + x2 + x3, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7956.9 -1644.3   163.7  1915.5  5959.5 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1910.091   2459.762   0.777    0.438    
x1            43.837      6.384   6.866 1.16e-11 ***
x2            24.910      2.794   8.915  < 2e-16 ***
x3            81.888     15.540   5.270 1.68e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2401 on 996 degrees of freedom
Multiple R-squared:  0.1319,    Adjusted R-squared:  0.1293 
F-statistic: 50.46 on 3 and 996 DF,  p-value: < 2.2e-16

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Set seed for reproducibility
np.random.seed(123)

# Set number of observations
n = 1000

# Generate independent variables
x1 = np.random.normal(40, 12, n)
x2 = np.random.normal(200, 50, n)
x3 = np.random.normal(150, 5, n)

# Generate dependent variable
y = 3*x1 + 4*x2 + 4*x3 + np.random.normal(500, 150, n)

# Generate MAR (Missing At Random) for x2

# Logistic function with scaled y values
prob_mar = 1 / (1 + np.exp(y-2050))  # Apply logistic function
rmar = np.random.binomial(1, prob_mar)  # Generate MAR indicator
x2[rmar == 0] = np.nan  # Set x2 to missing

# Replace missing values in x2 with the mean of x2
x2[np.isnan(x2)] = np.nanmean(x2)

# Create DataFrame
data = pd.DataFrame({'y': y, 'x1': x1, 'x2': x2, 'x3': x3})

# Run OLS regression (listwise deletion by default)
X = sm.add_constant(data[['x1', 'x2', 'x3']])  # Add intercept
model = sm.OLS(data['y'], X).fit()

# Display regression results
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.094
Model:                            OLS   Adj. R-squared:                  0.091
Method:                 Least Squares   F-statistic:                     34.32
Date:                Sun, 06 Apr 2025   Prob (F-statistic):           4.11e-21
Time:                        13:35:55   Log-Likelihood:                -6848.0
No. Observations:                1000   AIC:                         1.370e+04
Df Residuals:                     996   BIC:                         1.372e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        958.4029    228.310      4.198      0.000     510.379    1406.427
x1             2.6749      0.604      4.426      0.000       1.489       3.861
x2             2.3577      0.256      9.202      0.000       1.855       2.860
x3             3.7336      1.473      2.535      0.011       0.844       6.623
==============================================================================
Omnibus:                       45.270   Durbin-Watson:                   1.968
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               29.371
Skew:                           0.292   Prob(JB):                     4.19e-07
Kurtosis:                       2.398   Cond. No.                     7.36e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.36e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Now we move on to FIML approaches with Stata first.

%%stata

clear

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

* Generate MAR
gen prob_mar = logistic(y-21791)
gen rmar = 0 if prob_mar==0
replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
    
* Model estimation
sem (y <- x1 x2 x3), method(mlmv)

etable, append

twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit") legend(off)


. 
. clear

. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. * Generate MAR
. gen prob_mar = logistic(y-21791)

. gen rmar = 0 if prob_mar==0
(500 missing values generated)

. replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
(500 real changes made, 500 to missing)

.     
. * Model estimation
. sem (y <- x1 x2 x3), method(mlmv)
note: Missing values found in observed exogenous variables. Using the
      noxconditional behavior. Specify the forcexconditional option to
      override this behavior.
Endogenous variables
  Observed: y

Exogenous variables
  Observed: x1 x2 x3

Fitting saturated model:
Iteration 0:  Log likelihood = -19809.893  
Iteration 1:  Log likelihood = -18956.184  
Iteration 2:  Log likelihood = -18603.225  
Iteration 3:  Log likelihood = -18601.025  
Iteration 4:  Log likelihood =     -18601  
Iteration 5:  Log likelihood =     -18601  

Fitting baseline model:
Iteration 0:  Log likelihood = -18886.253  
Iteration 1:  Log likelihood =  -18792.75  
Iteration 2:  Log likelihood =  -18757.19  
Iteration 3:  Log likelihood = -18754.887  
Iteration 4:  Log likelihood = -18754.864  
Iteration 5:  Log likelihood = -18754.864  

Fitting target model:
Iteration 0:  Log likelihood =     -18601  
Iteration 1:  Log likelihood =     -18601  

Structural equation model                                Number of obs = 1,000
Estimation method: mlmv

Log likelihood = -18601

------------------------------------------------------------------------------
             |                 OIM
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
Structural   |
  y          |
          x1 |   38.10574   5.583454     6.82   0.000     27.16237    49.04911
          x2 |   41.70024   1.300009    32.08   0.000     39.15227    44.24821
          x3 |   40.31928   12.42463     3.25   0.001     15.96746     64.6711
       _cons |   5775.744   1925.672     3.00   0.003     2001.496    9549.992
-------------+----------------------------------------------------------------
     mean(x1)|   40.24597   .3686473   109.17   0.000     39.52343    40.96851
     mean(x2)|   200.6471   2.530853    79.28   0.000     195.6867    205.6075
     mean(x3)|    149.797   .1607991   931.58   0.000     149.4818    150.1121
-------------+----------------------------------------------------------------
     var(e.y)|    2300879   187511.7                       1961212     2699374
      var(x1)|   135.9009   6.077671                       124.496    148.3505
      var(x2)|    2554.76   197.2925                      2195.916    2972.244
      var(x3)|   25.85634   1.156331                      23.68646      28.225
-------------+----------------------------------------------------------------
   cov(x1,x2)|  -29.66936    22.2683    -1.33   0.183    -73.31444    13.97571
   cov(x1,x3)|   1.100681   1.874863     0.59   0.557    -2.573983    4.775344
   cov(x2,x3)|   12.02246   9.573148     1.26   0.209    -6.740568    30.78548
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0) = 0.00                 Prob > chi2 = .

. 
. etable, append

-----------------------------------------------------------------------------------------
                            y          2           y          y          y          6    
-----------------------------------------------------------------------------------------
x1                         29.409      29.409     25.214     20.149     36.863     38.106
                          (4.032)     (4.024)    (5.942)    (4.982)    (6.772)    (5.583)
x2                         42.717      42.717     42.557     26.963     28.005     41.700
                          (0.944)     (0.942)    (1.385)    (1.407)    (2.823)    (1.300)
x3                         54.520      54.520     48.179     19.701     63.762     40.319
                          (9.246)     (9.227)   (13.133)   (10.998)   (15.419)   (12.425)
Intercept                3778.896    3778.896   4912.711  13742.276   4173.834   5775.744
                       (1402.667)  (1399.858) (2007.467) (1717.077) (2437.463) (1925.672)
var(e.y)                             2.20e+06                                    2.30e+06
                                  (98396.280)                                  (1.88e+05)
mean(x1)                                                                           40.246
                                                                                  (0.369)
mean(x2)                                                                          200.647
                                                                                  (2.531)
mean(x3)                                                                          149.797
                                                                                  (0.161)
var(x1)                                                                           135.901
                                                                                  (6.078)
var(x2)                                                                          2554.760
                                                                                (197.292)
var(x3)                                                                            25.856
                                                                                  (1.156)
cov(x1,x2)                                                                        -29.669
                                                                                 (22.268)
cov(x1,x3)                                                                          1.101
                                                                                  (1.875)
cov(x2,x3)                                                                         12.022
                                                                                  (9.573)
Number of observations       1000        1000        508        500       1000       1000
-----------------------------------------------------------------------------------------

. 
. twoway (scatter y x2) (lfit y x2), title("Scatter Plot with Line of Best Fit"
> ) legend(off)

.

%%R

# Set seed for reproducibility
set.seed(123)

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Generate MAR (Missing At Random) for x2
prob_mar <- 1 / (1 + exp(-(y - 21791)))  # Logistic function
rmar <- ifelse(prob_mar == 0, 0, 1)  # 0 if prob_mar is 0, 1 otherwise
x2[rmar == 0] <- NA  # Set x2 to missing where rmar == 0

# Load 'lavaan' package for SEM (Structural Equation Modeling)
library(lavaan)

# Define the model formula (SEM model)
model <- '
  y ~ x1 + x2 + x3  # y as a function of x1, x2, and x3
'

# Fit the model using the 'lavaan' package
fit <- sem(model, data = data.frame(y, x1, x2, x3))

# Display the model summary
summary(fit)

lavaan 0.6-18 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4

                                                  Used       Total
  Number of observations                           606        1000

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate     Std.Err   z-value  P(>|z|)
  y ~                                                     
    x1                  19.799     4.270    4.637    0.000
    x2                  27.159     1.260   21.549    0.000
    x3                  41.840     9.653    4.334    0.000

Variances:
                   Estimate     Std.Err   z-value  P(>|z|)
   .y              1469996.086 84449.069   17.407    0.000

In addition: Warning message:
lavaan->lav_data_full():  
   some observed variances are larger than 1000000 use varTable(fit) to 
   investigate

To my knowledge, there is no viable FIML estimation practice within Python. This makes it impossible to use FIML as a handling missing data technique in Python and as such you have to use alternative methods such as Imputation. This will now be demonstrated, again starting with Stata. For the purposes of this simulation, as all variables are continuous a MVN Multiple Imputation approach can be used, though MICE allows for variables of different types to be imputed.

%%stata

clear

* Set seed
set seed 123

* Set number of observations
set obs 1000

* Generate metric independent variable #1
drawnorm x1, n(1000) means(40) sds(12)

* Generate metric independent variable #2
drawnorm x2, n(1000) means(200) sds(50)

* Generate metric independent variable #3
drawnorm x3, n(1000) means(150) sds(5)

* Generate metric dependent variable with large coefficients
gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

* Generate MAR
gen prob_mar = logistic(y-21791)
gen rmar = 0 if prob_mar==0
replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
    
mi set wide

mi register imputed y x1 x2 x3

tab _mi_miss


mi impute chained ///
///
(regress) y x1 x2 x3 ///
, rseed(12345) dots force add(10) burnin(10) 


mi estimate, post dots: regress y x1 x2 x3

etable, append


. 
. clear

. 
. * Set seed
. set seed 123

. 
. * Set number of observations
. set obs 1000
Number of observations (_N) was 0, now 1,000.

. 
. * Generate metric independent variable #1
. drawnorm x1, n(1000) means(40) sds(12)

. 
. * Generate metric independent variable #2
. drawnorm x2, n(1000) means(200) sds(50)

. 
. * Generate metric independent variable #3
. drawnorm x3, n(1000) means(150) sds(5)

. 
. * Generate metric dependent variable with large coefficients
. gen y = 30*x1 + 40*x2 + 50*x3 + rnormal(5000, 1500)

. 
. * Generate MAR
. gen prob_mar = logistic(y-21791)

. gen rmar = 0 if prob_mar==0
(500 missing values generated)

. replace x2 = . if rmar == 0  // Set x to missing where rmar == 0
(500 real changes made, 500 to missing)

.     
. mi set wide

. 
. mi register imputed y x1 x2 x3

. 
. tab _mi_miss

 Incomplete |
observation |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        500       50.00       50.00
          1 |        500       50.00      100.00
------------+-----------------------------------
      Total |      1,000      100.00

. 
. 
. mi impute chained ///
> ///
> (regress) y x1 x2 x3 ///
> , rseed(12345) dots force add(10) burnin(10) 
note: variables y x1 x3 contain no soft missing (.) values; imputing nothing
note: missing-value pattern is monotone; no iteration performed.

Conditional models (monotone):
                x2: regress x2 x3 x1 y

Performing chained iterations:
  imputing m=1 through m=10 .........10 done

Multivariate imputation                     Imputations =       10
Chained equations                                 added =       10
Imputed: m=1 through m=10                       updated =        0

Initialization: monotone                     Iterations =        0
                                                burn-in =        0

                 y: linear regression
                x1: linear regression
                x2: linear regression
                x3: linear regression

------------------------------------------------------------------
                   |               Observations per m             
                   |----------------------------------------------
          Variable |   Complete   Incomplete   Imputed |     Total
-------------------+-----------------------------------+----------
                 y |       1000            0         0 |      1000
                x1 |       1000            0         0 |      1000
                x2 |        500          500       500 |      1000
                x3 |       1000            0         0 |      1000
------------------------------------------------------------------
(Complete + Incomplete = Total; Imputed is the minimum across m
 of the number of filled-in observations.)

. 
. 
. mi estimate, post dots: regress y x1 x2 x3

Imputations (10):
  .........10 done

Multiple-imputation estimates                   Imputations       =         10
Linear regression                               Number of obs     =      1,000
                                                Average RVI       =     2.3440
                                                Largest FMI       =     0.5779
                                                Complete DF       =        996
DF adjustment:   Small sample                   DF:     min       =      27.94
                                                        avg       =      60.86
                                                        max       =     124.57
Model F test:       Equal FMI                   F(   3,  106.8)   =     365.04
Within VCE type:          OLS                   Prob > F          =     0.0000

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   36.94599   4.821662     7.66   0.000       27.403    46.48898
          x2 |    42.0321   1.471605    28.56   0.000     39.01734    45.04686
          x3 |   39.50875   12.21717     3.23   0.002     15.03525    63.98224
       _cons |   5850.553    2031.63     2.88   0.007     1725.712    9975.395
------------------------------------------------------------------------------

. 
. etable, append

----------------------------------------------------------------------------------------------------
                            y          2           y          y          y          6          y    
----------------------------------------------------------------------------------------------------
x1                         29.409      29.409     25.214     20.149     36.863     38.106     36.946
                          (4.032)     (4.024)    (5.942)    (4.982)    (6.772)    (5.583)    (4.822)
x2                         42.717      42.717     42.557     26.963     28.005     41.700     42.032
                          (0.944)     (0.942)    (1.385)    (1.407)    (2.823)    (1.300)    (1.472)
x3                         54.520      54.520     48.179     19.701     63.762     40.319     39.509
                          (9.246)     (9.227)   (13.133)   (10.998)   (15.419)   (12.425)   (12.217)
Intercept                3778.896    3778.896   4912.711  13742.276   4173.834   5775.744   5850.553
                       (1402.667)  (1399.858) (2007.467) (1717.077) (2437.463) (1925.672) (2031.630)
var(e.y)                             2.20e+06                                    2.30e+06           
                                  (98396.280)                                  (1.88e+05)           
mean(x1)                                                                           40.246           
                                                                                  (0.369)           
mean(x2)                                                                          200.647           
                                                                                  (2.531)           
mean(x3)                                                                          149.797           
                                                                                  (0.161)           
var(x1)                                                                           135.901           
                                                                                  (6.078)           
var(x2)                                                                          2554.760           
                                                                                (197.292)           
var(x3)                                                                            25.856           
                                                                                  (1.156)           
cov(x1,x2)                                                                        -29.669           
                                                                                 (22.268)           
cov(x1,x3)                                                                          1.101           
                                                                                  (1.875)           
cov(x2,x3)                                                                         12.022           
                                                                                  (9.573)           
Number of observations       1000        1000        508        500       1000       1000       1000
----------------------------------------------------------------------------------------------------

.

Now R code.

%%R
# Install and load the `mice` package for multiple imputation (if not already installed)
if (!require(mice)) install.packages("mice", dependencies = TRUE)
library(mice)

# Set seed for reproducibility
set.seed(123)

# Generate metric independent variables
x1 <- rnorm(1000, mean = 40, sd = 12)
x2 <- rnorm(1000, mean = 200, sd = 50)
x3 <- rnorm(1000, mean = 150, sd = 5)

# Generate metric dependent variable
y <- 30*x1 + 40*x2 + 50*x3 + rnorm(1000, mean = 5000, sd = 1500)

# Generate MAR (Missing At Random) for x2
prob_mar <- 1 / (1 + exp(-(y - 21791)))  # Logistic function
rmar <- ifelse(prob_mar == 0, 0, 1)  # 0 if prob_mar is 0, 1 otherwise
x2[rmar == 0] <- NA  # Set x2 to missing where rmar == 0

# Prepare the data for multiple imputation
data <- data.frame(y, x1, x2, x3)

# Perform multiple imputation using chained equations (MICE)
imp <- mice(data, m = 10, method = "pmm", seed = 12345)

# Perform regression after imputation
fit <- with(imp, lm(y ~ x1 + x2 + x3))

# Combine the results of the regression and display the estimates
pooled_results <- pool(fit)
summary(pooled_results)


 iter imp variable
  1   1  x2
  1   2  x2
  1   3  x2
  1   4  x2
  1   5  x2
  1   6  x2
  1   7  x2
  1   8  x2
  1   9  x2
  1   10  x2
  2   1  x2
  2   2  x2
  2   3  x2
  2   4  x2
  2   5  x2
  2   6  x2
  2   7  x2
  2   8  x2
  2   9  x2
  2   10  x2
  3   1  x2
  3   2  x2
  3   3  x2
  3   4  x2
  3   5  x2
  3   6  x2
  3   7  x2
  3   8  x2
  3   9  x2
  3   10  x2
  4   1  x2
  4   2  x2
  4   3  x2
  4   4  x2
  4   5  x2
  4   6  x2
  4   7  x2
  4   8  x2
  4   9  x2
  4   10  x2
  5   1  x2
  5   2  x2
  5   3  x2
  5   4  x2
  5   5  x2
  5   6  x2
  5   7  x2
  5   8  x2
  5   9  x2
  5   10  x2
         term   estimate   std.error statistic       df      p.value
1 (Intercept) 2184.92911 1901.113549  1.149289 415.6189 2.510977e-01
2          x1   30.19575    5.472728  5.517495 165.8509 1.299693e-07
3          x2   39.85730    1.632656 24.412548  80.6700 5.233432e-39
4          x3   67.14173   12.700619  5.286493 308.6763 2.363451e-07

Loading required package: mice

Attaching package: ‘mice’

The following object is masked from ‘package:stats’:

    filter

The following objects are masked from ‘package:base’:

    cbind, rbind

And finally, python. (not currently working)