The dating simulation demo most closely mirrors the primary application of agile machine learning at if(we). While simulated data can never mimic the full richness of user behavior, it provides a convenient encapsulation and allows us to share our work without putting at risk user privacy.
Configure environment
If you haven’t done so in the context of other demos you will need to set the environment variable $ANTELOPE_TRAINING to a path on the local filesystem.
mkdir $ANTELOPE_TRAINING
export ANTELOPE_TRAINING=$ANTELOPE_DEMO/training
Generate training data
First create an empty weights file—we’ll fill it in later.
touch $ANTELOPE_TRAINING/r_logit_coef_dating.txt
Within sbt execute the following commands:
project demo-dating-simulation
runMain co.ifwe.antelope.datingdemo.exec.Train
The simulation will begin to simulate a users joining a dating site and interacting with one another. The model has access to user profile information and observed user behavior, but it cannot access the randomized parameters describing user preferences or user activity levels. Periodically users the model for recommendations, which it must select, aiming to optimize mutual compatibility.
Training runs from timestamp 0 up to 2,000,000. During this time period the model provides random recommendations. You can see the data by viewing the training file on the command line:
less $ANTELOPE_TRAINING/demo_dating_training_data.csv
We have provided a script to train the model using R. Execute it on the command line and view the output at follows
cd $ANTELOPE_TRAINING
r --no-save < $ANTELOPE_DEMO/antelope/scripts/r/train_dating.r
cat r_logit_coef_dating.txt
Your coefficients should look like the following
0.8846154,-0.01354816,-1.190565e-07,0.06699019,-0.008815056,-0.002402504,-0.09513831,0.02153472
Score the model
Back within sbt execute the following commands:
runMain co.ifwe.antelope.datingdemo.exec.Score
this runs quickly for the until the timestamp reaches 2,000,000, slowing after that as the simulation calls on the model recommendations rather than on random recommendations for a fraction of the requests. The simulation is configured to run up to timestamp 2,500,000.
Final output shows something like the following.
DatingModel 6362|1390|1318|79 : 21.8% 94.8% 6.0% : 1.2%
RandomRecommendation 1894625|102066|100173|13672 : 5.4% 98.1% 13.6% : 0.7%
The number that matters most is the final percentage: the DatingModel makes matches at a rate of 1.2% whereas the RandomRecommendation model makes matches only 0.7% of the time.
TODO: section to be extended with further discussion of simulation internals.
< Back to Documentation Index