![]() This will work for you: sage: perm = (12r)Īrray()Īnother way is to let Sage transform the Python int to a Sage Integer but then force it to convert it back to a Python integer: sage: perm = (int(12))Īnother thing you could do is to turn off the Sage preparser. The simplest way to input a raw Python integer is to append r to it: sage: perm = (12r) However numpy is not so happy being fed a Sage Integer, Where you see in particular the line: -> 1 perm = (Integer(12)) opt/sage/local/lib/python2.7/site-packages/numpy/random/mtrand.so in (numpy/random/mtrand/mtrand.c:20965)() opt/sage/local/lib/python2.7/site-packages/numpy/random/mtrand.so in (numpy/random/mtrand/mtrand.c:21297)() TypeError Traceback (most recent call last) When you input the following: sage: import numpy I'll use 12 instead of 128 so the examples fit in one line. Keep colums order, shuffle rows within each column print(np.apply_along_axis(np.random.In Sage, the input is preparsed by the Sage preparser. Keep row order, shuffle colums within each row print(np.apply_along_axis(np.random.permutation, 1, a)) recreate a new (shuffled) pandas df from the shuffled np.array.apply the method shown below to shuffle the np.array by row or column.get the values of the dataframe with values = df.values,.If you panda data frame is named df, maybe you can: If that is acceptable then this would be helpful, note it is easy to switch the axis along which the data is shuffled. I know the question is for a pandas df but in the case the shuffle occurs by row (column order changed, row order unchanged), then the columns names do not matter anymore and it could be interesting to use an np.array instead, then np.apply_along_axis() will be what you are looking for. Your final function then uses a trick to bring the result in line with the expectation for applying a function to an axis: def shuffle(df, n=1, axis=0):Īxis = int(not axis) # pandas.DataFrame is always 2Dįor view in numpy.rollaxis(df.values, axis): If x is an array, make a copy and shuffle the elements randomly. If x is an integer, randomly permute np.arange (x). Out: (2, 10) # we can iterate over 2 arrays with shape (10,) (columns) Randomly permute a sequence, or return a permuted range. Out: (10, 2) # we can iterate over 10 arrays with shape (2,) (rows) Note that numpy.rollaxis brings the specified axis to the first dimension and then let's us iterate over arrays with the remaining dimensions, i.e., if we want to shuffle along the first dimension (columns), we need to roll the second dimension to the front, so that we apply the shuffling to views over the first dimension. In : %timeit df.apply(, axis=1)įor view in numpy.rollaxis(df.values, 0): : for view in numpy.rollaxis(df.values, 1): Shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis)ĭf = pandas.DataFrame() This does not work for me: def shuffle(df, n, axis=0): Something like: for 1.n:īut hopefully more efficient than naive looping. So if you have two columns a and b, I want each row shuffled on its own, so that you don't have the same associations between a and b as you do if you just re-order each row as a whole. ![]() ![]() When I say shuffle the rows, I mean shuffle each row independently. I want the resulting df to be the same as the original except with the order of rows or order of columns different.Įdit2: My question was unclear. If you just shuffle df.index that loses all that information. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis ( axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.Įdit: key is to do this without destroying the row/column labels of the dataframe. What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |