# Generating data with random Gaussian noise

ModelingStatisticsposted by Nikolay Manchev March 28, 2017 Nikolay Manchev

I recently needed to generate some data for yy as a function of xx, with some added Gaussian noise. This comes in handy when you want to generate data with an underlying regularity that you want to discover, for example when testing different machine learning algorithms.

What I wanted to get is a mechanism that will allow me to specify a range for xx and then generate data using

y=f(x)+ϵy=f(x)+ϵ

with capability to control the function f(x)f(x) and the parameters of the Gaussian noise ϵϵ.

I came up with this simple function, which allows me to specify f(x)f(x), the xx interval and step, and the Gaussian distribution parameters (μμ and σσ).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | `def` `corr_vars( start` `=` `-` `10` `, stop` `=` `10` `, step` `=` `0.5` `, mu` `=` `0` `, sigma` `=` `3` `, func` `=` `lambda` `x: x ):` ` ` `# Generate x` ` ` `x ` `=` `np.arange(start, stop, step) ` ` ` ` ` `# Generate random noise` ` ` `e ` `=` `np.random.normal(mu, sigma, x.size)` ` ` ` ` `# Generate y values as y = func(x) + e` ` ` `y ` `=` `np.zeros(x.size)` ` ` ` ` `for` `ind ` `in` `range` `(x.size):` ` ` `y[ind] ` `=` `func(x[ind]) ` `+` `e[ind]` ` ` ` ` `return` `(x,y)` |

Here are two examples of using the function to generate two data sets – one using y=x+ϵy=x+ϵ, the other – y=2∗π∗sin(x)+ϵy=2∗π∗sin(x)+ϵ.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | `np.random.seed(` `2` `)` `(x0,y0) ` `=` `corr_vars(sigma` `=` `3` `) ` `(x1,y1) ` `=` `corr_vars(sigma` `=` `3` `, func` `=` `lambda` `x: ` `2` `*` `pi` `*` `sin(x)) ` `f, axarr ` `=` `plt.subplots(` `2` `, sharex` `=` `True` `, figsize` `=` `(` `7` `,` `7` `))` `axarr[` `0` `].scatter(x0, y0) ` `axarr[` `0` `].plot(x0, x0, color` `=` `'r'` `)` `axarr[` `0` `].set_title(` `'y = x + e'` `)` `axarr[` `0` `].grid(` `True` `)` `axarr[` `1` `].scatter(x1, y1) ` `axarr[` `1` `].plot(x1, ` `2` `*` `pi` `*` `np.sin(x1), color` `=` `'r'` `)` `axarr[` `1` `].set_title(` `'y = 2*π*sin(x) + e'` `)` `axarr[` `1` `].grid(` `True` `)` |

The snippet above plots the resulting data sets, together with the noiseless function (in red) for comparison.

The full source code is available on GitHub.

The original post is located at cleverowl.uk