Monday, December 3, 2018

python - Provide a reproducible copy of the DataFrame with to_clipboard()




2018-09-18_reproducible_dataframe.ipynb



This was marked as a duplicate, however, the other question and answer(s) do not cover to_clipboard, while this question specifically covers .to_clipboard and is more succinct.




This may seem like an obvious question. However, many of the users asking questions about Pandas are new and inexperienced. A critical component of asking a question is How to create a Minimal, Complete, and Verifiable example, which explains "what" and "why", but not "how".



For example, as the OP, I may have the following dataframe:




  • For this example, I've created synthetic data, which is an option for creating a reproducible dataset, but not within the scope of this question.


    • Think of this, as if you've loaded a file, and only need to share a bit of it, to reproduce the error.





import pandas as pd
import numpy as np

np.random.seed(365)
data = {'a': [np.random.randint(10) for _ in range(15)],
'b': [np.random.randint(10) for _ in range(15)],
'date': pd.bdate_range(pd.datetime.today(), periods=15).tolist()}


df = pd.DataFrame(data)

a b date
0 2 0 2019-11-06
1 4 8 2019-11-07
2 1 4 2019-11-08
3 5 3 2019-11-11
4 2 2 2019-11-12
5 2 6 2019-11-13
6 9 2 2019-11-14

7 8 6 2019-11-15
8 4 8 2019-11-18
9 0 9 2019-11-19
10 3 6 2019-11-20
11 3 1 2019-11-21
12 7 6 2019-11-22
13 7 5 2019-11-25
14 7 7 2019-11-26



The dataframe could be followed by some other code, that produces an error or doesn't produce the desired outcome



Things that should be provided when asking a question on .




  • A well written coherent question

  • The code that produces the error

  • The error stack

  • Potentially, the expected outcome of some code

  • The data, in an easily usable form



Answer



Quickest method to provide sample data from a pandas DataFrame



There is more than one way to answer this question. However, this answer isn't meant to provide an exhaustive solution. It provides the simplest method possible. For the curious, there are other more verbose solutions provided on .




  1. Provide a link to a shareable dataset (maybe on GitHub or a shared file on Google). This is particularly useful if it's a large dataset and the objective is to optimize some method. The drawback is that the data may no longer be available in the future, which reduces the benefit of the post.



    • Data must be provided in the question, but can be accompanied by a link to a more extensive dataset.

    • Do not post only a link or an image of the data.


  2. Provide the output of df.head(10).to_clipboard(sep=',', index=False)



Code:



Provide the output of pandas.DataFrame.to_clipboard




df.head(10).to_clipboard(sep=',', index=False)



  • If you have a multi-index DataFrame or an index other than 0...n, use index=True and provide a note in your question as to which column(s) are the index.

  • Note: when the previous line of code is executed, no output will appear. The result of the code is now in the clipboard.

  • paste the clipboard into a code block in your question



a,b,date

2,0,2019-11-06
4,8,2019-11-07
1,4,2019-11-08
5,3,2019-11-11
2,2,2019-11-12
2,6,2019-11-13
9,2,2019-11-14
8,6,2019-11-15
4,8,2019-11-18
0,9,2019-11-19




  • This can be copied to the clipboard by someone trying to answer your question, and followed by:



df = pd.read_clipboard(sep=',')

No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...