{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Breast Cancer\n\nThis example generates a Mapper built from the `Wisconsin Breast Cancer Dataset`_.\n\n\n\nThe reasoning behind the choice of lenses in the demonstration below is:\n\n- **For lens1:** Lenses that make biological sense; in other words, lenses that highlight special features in the data, that I know about.\n- **For lens2:** Lenses that disperse the data, as opposed to clustering many points together.\n\nIn the case of this particular data, using an anomaly score (in this case calculated using the IsolationForest from sklearn) makes biological sense since cancer cells are anomalous. For the second lens, we use the $l^2$ norm.\n\nFor an interactive exploration of lens for the breast cancer, see the `Choosing a lens notebook <../../notebooks/Cancer-demo.html>`_.\n\nKeplerMapper also permits setting multiple datapoint color functions and node color functions in its html visualizations.\nThe example code below demonstrates three ways this might be done. The rendered visualizations are also viewable:\n\n- `Visualization of the breat cancer mapper using multiple datapoint color functions <../../_static/breast-cancer-multiple-color-functions.html>`_\n- `Visualization of the breat cancer mapper using multiple node color functions <../../_static/breast-cancer-multiple-node-color-functions.html>`_\n- `Visualization of the breat cancer mapper using multiple datapoint and node color functions <../../_static/breast-cancer-multiple-color-functions-and-multiple-node-color-functions.html>`_\n\n<img src=\"file://../../../examples/breast-cancer/breast-cancer.png\">\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import sys\n\ntry:\n    import pandas as pd\nexcept ImportError as e:\n    print(\n        \"pandas is required for this example. Please install with `pip install pandas` and then try again.\"\n    )\n    sys.exit()\n\nimport numpy as np\nimport kmapper as km\nimport sklearn\nfrom sklearn import ensemble\n\n# For data we use the Wisconsin Breast Cancer Dataset\n# Via:\ndf = pd.read_csv(\"data/breast-cancer.csv\")\nfeature_names = [c for c in df.columns if c not in [\"id\", \"diagnosis\"]]\ndf[\"diagnosis\"] = df[\"diagnosis\"].apply(lambda x: 1 if x == \"M\" else 0)\nX = np.array(df[feature_names].fillna(0))  # quick and dirty imputation\ny = np.array(df[\"diagnosis\"])\n\n# We create a custom 1-D lens with Isolation Forest\nmodel = ensemble.IsolationForest(random_state=1729)\nmodel.fit(X)\nlens1 = model.decision_function(X).reshape((X.shape[0], 1))\n\n# We create another 1-D lens with L2-norm\nmapper = km.KeplerMapper(verbose=3)\nlens2 = mapper.fit_transform(X, projection=\"l2norm\")\n\n# Combine both lenses to create a 2-D [Isolation Forest, L^2-Norm] lens\nlens = np.c_[lens1, lens2]\n\n# Create the simplicial complex\ngraph = mapper.map(\n    lens,\n    X,\n    cover=km.Cover(n_cubes=15, perc_overlap=0.4),\n    clusterer=sklearn.cluster.KMeans(n_clusters=2, random_state=1618033),\n)\n\n# Visualization\nmapper.visualize(\n    graph,\n    path_html=\"output/breast-cancer.html\",\n    title=\"Wisconsin Breast Cancer Dataset\",\n    custom_tooltips=y,\n)\n\n\n# Visualization with multiple color functions\nmapper.visualize(\n    graph,\n    path_html=\"output/breast-cancer-multiple-color-functions.html\",\n    title=\"Wisconsin Breast Cancer Dataset\",\n    custom_tooltips=y,\n    color_values=lens,\n    color_function_name=[\"Isolation Forest\", \"L2-norm\"],\n)\n\n\n# Visualization with multiple node color functions\nmapper.visualize(\n    graph,\n    path_html=\"output/breast-cancer-multiple-node-color-functions.html\",\n    title=\"Wisconsin Breast Cancer Dataset\",\n    custom_tooltips=y,\n    node_color_function=[\"mean\", \"std\", \"median\", \"max\"],\n)\n\n# Visualization showing both multiple color functions, and also multiple node color functions\nmapper.visualize(\n    graph,\n    path_html=\"output/breast-cancer-multiple-color-functions-and-multiple-node-color-functions.html\",\n    title=\"Wisconsin Breast Cancer Dataset\",\n    custom_tooltips=y,\n    color_values=lens,\n    color_function_name=[\"Isolation Forest\", \"L2-norm\"],\n    node_color_function=[\"mean\", \"std\", \"median\", \"max\"],\n)\n\n\nimport matplotlib.pyplot as plt\n\nkm.draw_matplotlib(graph)\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}