{"id":148,"date":"2024-07-31T03:03:50","date_gmt":"2024-07-30T19:03:50","guid":{"rendered":"https:\/\/www.shirui.me\/blog\/?p=148"},"modified":"2024-07-31T03:59:25","modified_gmt":"2024-07-30T19:59:25","slug":"a-simple-batch-gpc-data-processing-program","status":"publish","type":"post","link":"https:\/\/www.shirui.me\/blog\/2024\/07\/31\/a-simple-batch-gpc-data-processing-program\/","title":{"rendered":"A simple batch GPC-data processing program"},"content":{"rendered":"\n<p>To begin with, I must emphasize that this program was not originally deemed necessary. The story dates back to a lab meeting almost six years ago. During the meeting, a student using the GPC instrument complained that the software couldn&#8217;t handle batch data processing because it was a &#8220;premium&#8221; feature, requiring an additional ~$5000 USD fee. After the meeting, I decided to write this program. It includes two main functions: a baseline correction function and a GPC fitting function, which allows data to be further corrected using Mark-Houwink equations. I had forgotten about this program for a long time until recently, when I encountered a peak-finding problem that required baseline correction, and I rediscovered this old program.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"language-python\">import numpy as np\nfrom scipy import signal\nfrom scipy import sparse\nfrom scipy.sparse.linalg import spsolve\n\n\n# Function for baseline detection using Continuous Wavelet Transform (CWT)\ndef baseline_cwt_br(t, y, widths=np.arange(1, 61), wavelet=signal.ricker, theta_iter=2, deg=5):\n    r'''Automatic baseline detect algorithm via CWT.\n    by Carlo G. Bertinetto and Tapani Vuorinen, 2014\n    DOI: 10.1366\/13-07018, Volume 68, Number 2, 2014. APPLIED SPECTROSCOPY\n    '''\n    # Compute the Continuous Wavelet Transform (CWT) of the input signal\n    cwt_mat = signal.cwt(y, wavelet, widths)\n\n    # Compute entropy for each scale (width) to find the optimal scale\n    cwt_mat_p = np.abs(cwt_mat)  # Take absolute value to handle negative peaks\n    cwt_mat_n = cwt_mat_p \/ np.sum(np.abs(cwt_mat_p), axis=1)[:, None]  # Normalize\n    H_w = -np.sum(np.nan_to_num(cwt_mat_n * np.nan_to_num(np.log(cwt_mat_n))), axis=1)  # Compute entropy\n\n    # Find the scale (width) with the minimum entropy\n    ind_wh = np.argmin(H_w)\n    wh = widths[ind_wh]\n    cwt_min = cwt_mat[ind_wh]\n\n    # Evaluate threshold using histogram method\n    n = 200  # Initial number of bins\n    _count = 0\n    while _count < theta_iter:\n        n_old = n\n        p, e = np.histogram(cwt_min, bins=n)  # Compute histogram\n        e = e[:-1]\n        ind = np.logical_and(p >= p.max() \/ 3, p <= p.max())\n        sigma = (np.sum((e[ind])**2 * p[ind]) \/ p[ind].sum())**0.5  # Compute standard deviation\n        n = int(8 * (cwt_min.max() - cwt_min.min()) \/ sigma)  # Update number of bins\n        if (n - n_old) \/ n_old < 0.05:\n            _count += 1\n\n    # Define baseline indices based on the threshold\n    n_ind = np.logical_and(cwt_min > -3 * sigma, cwt_min < 3 * sigma)\n    theta = sigma * (0.6 + 10 * cwt_min[n_ind].shape[0] \/ cwt_min.shape[0])\n    baseline_ind = np.logical_and(cwt_min > -theta, cwt_min < theta)\n\n    # Iterative polynomial fitting until convergence\n    b_p = y[baseline_ind]\n    b_t = t[baseline_ind]\n    converge = False\n    while not converge:\n        n = b_p.shape[0] \/\/ 100  # Segment length\n        b_p_old = np.copy(b_p)  # Copy previous baseline points\n        p = np.polyfit(b_t, b_p, deg=6)  # Fit polynomial of degree 6\n        pi = np.poly1d(p)(b_t)  # Evaluate polynomial\n        std = np.mean((pi - b_p)**2) ** 0.5  # Compute standard deviation\n        ind1 = b_p - pi < 1.5 * std  # Identify points within 1.5 * std\n        b_p = b_p[ind1]  # Update baseline points\n        b_t = b_t[ind1]\n        p = np.polyfit(b_t, b_p, deg=deg)  # Fit polynomial of specified degree\n        pi = np.poly1d(p)(t)  # Evaluate polynomial\n        ind2 = y - pi < 0  # Identify points below the polynomial\n        segs = np.cumsum(np.r_[\n            0, np.diff(np.flatnonzero(np.diff(ind2)) + 1, prepend=0, append=ind2.shape[0])\n        ])\n        for i in range(1, segs.shape[0]):\n            seg = ind2[segs[i-1]:segs[i]]\n            if seg[0] and (seg.shape[0] >= n):\n                b_p = np.append(b_p, y[segs[i-1] + seg.shape[0] \/\/ 2])\n                b_t = np.append(b_t, t[segs[i-1] + seg.shape[0] \/\/ 2])\n\n        # Check for convergence\n        if b_p_old.shape[0] == b_p.shape[0]:\n            if np.allclose(b_p_old, b_p):\n                converge = True\n\n    return p(t)  # Return the computed baseline\n<\/code><\/pre>\n\n\n\n<p>This baseline correction algorithm was adapted from a method where the authors used an iterative approach. They fit the baseline, identified from the slow modes of the CWT, with polynomials. In each iteration, outliers are removed until the baseline converges. This method is adequate for signals with high signal-noise ratio, for noisy systems, the signal should be smoothed first.<\/p>\n\n\n\n<p>The second part is the GPC-processing part, the codes are here:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"language-python\">import warnings\nimport argparse\nimport numpy as np\nimport pandas as pd\nfrom scipy.integrate import simps\nfrom scipy.signal import detrend\nfrom matplotlib import pyplot as plt\nfrom scipy import sparse\nfrom scipy.sparse.linalg import spsolve\nfrom baseline import snr\nfrom baseline import baseline_cwt_br\nimport re\n\n\ndescription = \"\"\"\nA simple GPC data analysis program, for\nI aint f**king believe that a simple GPC\ndata analysis program is\nworth 30000 RMB aka $5000 USD.\n\"\"\"\n\narg_parser = argparse.ArgumentParser(\n    description=description, formatter_class=RawTextHelpFormatter)\narg_parser.add_argument('-S', '--standard',\n                        dest='standard', type=float, metavar=\"standard curve\",\n                        nargs=2, help=\"Standard curve, slope `k' and intercept `b0'.\")\narg_parser.add_argument('-O', '--output',\n                        metavar='Output file',\n                        default='mwd', dest='out_put',\n                        help=\"Optional, use `mwd' as default prefix.\")\narg_parser.add_argument('gpc_file(s)',\n                        nargs=+, type=str\n                        help='GPC data file(s).')\narg_parser.add_argument('-M', '--method',\n                        nargs=+, type=str, dest='method', metavar='Detector(s).',\n                        help='GPC detector(s).')\n\t\t\t\t\t\t\n\t\t\t\t\t\t\nargs = arg_parser.parse_args()\nalvars = vars(args)\nk, b0 = alvars['standard']\n#k, b0 = -0.5005413052736007, 11.278597269990836  # test data.\n# TODO: add combination mode of detectors.\ndetector = alvars['method']\n#mark_houwink = pd.read_csv('mh.dat', sep='_')\n# MH equation: K_1 M_1^{a_1} = K_2 M_2^{a_2}, universal calibration curve.\n# k2, b0_2 = k * (a_2 + 1)\/(a_1 + 1), b0 + k * np.log(K_2\/K_1)\/np.log(10)  # mk equation to transfer M1->M2\n# k, b as function of K, a, if we know the k, b0 and K_1, a_1 for a certain polymer.\n\n\n\ndef std_func(t, k, b0):\n    r'''GPC standard curve t->M.'''\n    return np.exp((k*t + b0)*np.log(10))\n\n\n\n#TODO: add dedrend and find_peak here, or perhaps a GUI?\n# RI signal = K_{RI} * dn\/dc * c\n# LS signal = K_{LS} * (dn\/dc)^2 * Mw * c\n# Visc. signal = K_{visc} * IntrisicViscosity * c\n\nret = []\nfiles = alvars['gpc_file(s)']\nfor i, f in enumerate(files):\n\tdata = np.loadtxt(f)\n\tt, y = data.T[0], data.T[1]\n\twhile True:\n\t\tplt.plot(t, y)\n\t\tplt.show()\n\t\ts, e = [float(_) for _ in re.split(',\\s*|\\s+',input(\"Trim the GPC, start, end: \")) if _ != '']\n\t\tinds = np.logical_and(t>s, t&lt;e)\n\t\tt1 = t[inds]\n\t\ty1 = y[inds]\n\t\tplt.plot(t1, y1)\n\t\tplt.show()\n\t\tif input(\"End trim? [y|n] \") == 'y':  # trim data\n\t\t\tbreak\n\tb = baseline_cwt_br(t1, y1)  # baseline correction\n\tplt.plot(t1, y1-b)\n\tplt.show()\n\tt = t1\n\ty = y1\n\tm, i = std_func(t, k, b0), y\n\tif detector == 'RI':\n\t\tp  = i\/m * 1\/abs(k*np.log(10))  # P(t) -> P(M); the 1\/m factor is used to transform P(Log M) -> P(M)\n\tnp.savetxt('%s_%03d.txt' % (alvars['out_put'], i), np.vstack([m, np.abs(p\/simps(p, m))]).T, fmt='%.6f')\n<\/code><\/pre>\n\n\n\n<p>There are still unresolved &#8220;TODO&#8221; items, which I hope to complete someday. These may include new functions, such as deconvolution for two-step polymerization.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To begin with, I must emphasize that this program was not originally deemed necessary. The story dates back to a lab meeting almost six years ago. During the meeting, a student using the GPC instrument complained that the software couldn&#8217;t handle batch data processing because it was a &#8220;premium&#8221; feature, requiring an additional ~$5000 USD [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,9],"tags":[],"class_list":["post-148","post","type-post","status-publish","format-standard","hentry","category-misc","category-notes"],"_links":{"self":[{"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/posts\/148","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/comments?post=148"}],"version-history":[{"count":5,"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/posts\/148\/revisions"}],"predecessor-version":[{"id":191,"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/posts\/148\/revisions\/191"}],"wp:attachment":[{"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/media?parent=148"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/categories?post=148"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shirui.me\/blog\/wp-json\/wp\/v2\/tags?post=148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}