Pandas: How to group by a value in column when there is list in one of the columnsHow to make a flat list out...

If I sold a PS4 game I owned the disc for, can I reinstall it digitally?

Process to change collation on a database

Disable the ">" operator in Rstudio linux terminal

Explain the objections to these measures against human trafficking

Quenching swords in dragon blood; why?

Solving Fredholm Equation of the second kind

Are there neural networks with very few nodes that decently solve non-trivial problems?

Do authors have to be politically correct in article-writing?

Why doesn't "auto ch = unsigned char{'p'}" compile under C++ 17?

What makes the Forgotten Realms "forgotten"?

Why don't American passenger airlines operate dedicated cargo flights any more?

Lick explanation

What is better: yes / no radio, or simple checkbox?

Why did this image turn out darker?

Why is "points exist" not an axiom in geometry?

Can a person refuse a presidential pardon?

A minimum of two personnel "are" or "is"?

Why does a metal block make a shrill sound but not a wooden block upon hammering?

How would one buy a used TIE Fighter or X-Wing?

Why are the books in the Game of Thrones citadel library shelved spine inwards?

Can an insurance company drop you after receiving a bill and refusing to pay?

A starship is travelling at 0.9c and collides with a small rock. Will it leave a clean hole through, or will more happen?

Using only 1s, make 29 with the minimum number of digits

Is there some relative to Dutch word "kijken" in German?



Pandas: How to group by a value in column when there is list in one of the columns


How to make a flat list out of list of lists?How do I check if a list is empty?How do I sort a dictionary by value?How to make a flat list out of list of lists?How to concatenate two lists in Python?How to clone or copy a list?How do I list all files of a directory?Renaming columns in pandasDelete column from pandas DataFrame by column nameSelect rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers













14















I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



Dataframe:



 value_1:        value_2:           value_3:               list: 
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....


My expected output is:



value_1:        value_2:              value_3:             list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]


Thanks!










share|improve this question







New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • There are all strings and one list column?

    – jezrael
    16 hours ago











  • Super, and if use print (df.iloc[0].apply(type)) ?

    – jezrael
    16 hours ago











  • OK, so both solution working.

    – jezrael
    16 hours ago
















14















I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



Dataframe:



 value_1:        value_2:           value_3:               list: 
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....


My expected output is:



value_1:        value_2:              value_3:             list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]


Thanks!










share|improve this question







New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • There are all strings and one list column?

    – jezrael
    16 hours ago











  • Super, and if use print (df.iloc[0].apply(type)) ?

    – jezrael
    16 hours ago











  • OK, so both solution working.

    – jezrael
    16 hours ago














14












14








14








I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



Dataframe:



 value_1:        value_2:           value_3:               list: 
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....


My expected output is:



value_1:        value_2:              value_3:             list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]


Thanks!










share|improve this question







New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I am trying to group-by the values in my "value_1" column. But my last column is made up of lists. When I try to group-by using my "value_1" column, the column made up of lists disappears.



Dataframe:



 value_1:        value_2:           value_3:               list: 
american california, nyc walmart, kmart [supermarket, connivence]
canadian toronto dunkinDonuts [coffee]
american texas [state]
canadian walmart [supermarket]
... ... ... ....


My expected output is:



value_1:        value_2:              value_3:             list: 
american california, nyc, texas walmart, kmart [supermarket, connivence, state]
canadian toronto dunkinDonuts, walmart [coffee, supermarket]


Thanks!







python pandas






share|improve this question







New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 16 hours ago









johnJones901johnJones901

764




764




New contributor




johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






johnJones901 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • There are all strings and one list column?

    – jezrael
    16 hours ago











  • Super, and if use print (df.iloc[0].apply(type)) ?

    – jezrael
    16 hours ago











  • OK, so both solution working.

    – jezrael
    16 hours ago



















  • There are all strings and one list column?

    – jezrael
    16 hours ago











  • Super, and if use print (df.iloc[0].apply(type)) ?

    – jezrael
    16 hours ago











  • OK, so both solution working.

    – jezrael
    16 hours ago

















There are all strings and one list column?

– jezrael
16 hours ago





There are all strings and one list column?

– jezrael
16 hours ago













Super, and if use print (df.iloc[0].apply(type)) ?

– jezrael
16 hours ago





Super, and if use print (df.iloc[0].apply(type)) ?

– jezrael
16 hours ago













OK, so both solution working.

– jezrael
16 hours ago





OK, so both solution working.

– jezrael
16 hours ago












2 Answers
2






active

oldest

votes


















6














Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



f1 = lambda x: ', '.join(x.dropna())
#alternative for join only strings
#f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
f2 = lambda x: [z for y in x for z in y]
d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
d['list'] = f2

df = df.groupby('value_1', as_index=False).agg(d)
print (df)
value_1 value_2 value_3
0 american california, nyc, texas walmart, kmart
1 canadian toronto dunkinDonuts, walmart

list
0 [supermarket, connivence, state]
1 [coffee, supermarket]


Explanation:



f1 and f2 are lambda functions.



First remove missing values (if exist) and join strings with separator:



f1 = lambda x: ', '.join(x.dropna())


First get only strings values (omit missing values, because NaNs) and join strings with separator:



f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


First get all string values with filtering empty strings and join strings with separator:



f1 = lambda x: ', '.join([y for y in x if y != '']) 


Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



f2 = lambda x: [z for y in x for z in y]





share|improve this answer


























  • @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

    – jezrael
    16 hours ago






  • 1





    @johnJones901 - Answer was edited.

    – jezrael
    15 hours ago











  • @johnJones901 - You are welcome!

    – jezrael
    15 hours ago



















4














You could groupby value_1 and aggregate with the following function for the strings:



def fun(x):
return x.str.cat(sep=', ')


And use GroupBy.sum to append the lists in the column list:



df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': fun, 'value_3':fun})

list value_2
value_1
american [supermarket, connivence, state] california, nyc, texas
canadian [coffee, sipermarket] toronto, texas

value_3
value_1
american walmart, kmart, dunkinDonuts
canadian dunkinDonuts, walmart





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    johnJones901 is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54944344%2fpandas-how-to-group-by-a-value-in-column-when-there-is-list-in-one-of-the-colum%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



    f1 = lambda x: ', '.join(x.dropna())
    #alternative for join only strings
    #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
    f2 = lambda x: [z for y in x for z in y]
    d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
    d['list'] = f2

    df = df.groupby('value_1', as_index=False).agg(d)
    print (df)
    value_1 value_2 value_3
    0 american california, nyc, texas walmart, kmart
    1 canadian toronto dunkinDonuts, walmart

    list
    0 [supermarket, connivence, state]
    1 [coffee, supermarket]


    Explanation:



    f1 and f2 are lambda functions.



    First remove missing values (if exist) and join strings with separator:



    f1 = lambda x: ', '.join(x.dropna())


    First get only strings values (omit missing values, because NaNs) and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


    First get all string values with filtering empty strings and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if y != '']) 


    Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



    f2 = lambda x: [z for y in x for z in y]





    share|improve this answer


























    • @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

      – jezrael
      16 hours ago






    • 1





      @johnJones901 - Answer was edited.

      – jezrael
      15 hours ago











    • @johnJones901 - You are welcome!

      – jezrael
      15 hours ago
















    6














    Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



    f1 = lambda x: ', '.join(x.dropna())
    #alternative for join only strings
    #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
    f2 = lambda x: [z for y in x for z in y]
    d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
    d['list'] = f2

    df = df.groupby('value_1', as_index=False).agg(d)
    print (df)
    value_1 value_2 value_3
    0 american california, nyc, texas walmart, kmart
    1 canadian toronto dunkinDonuts, walmart

    list
    0 [supermarket, connivence, state]
    1 [coffee, supermarket]


    Explanation:



    f1 and f2 are lambda functions.



    First remove missing values (if exist) and join strings with separator:



    f1 = lambda x: ', '.join(x.dropna())


    First get only strings values (omit missing values, because NaNs) and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


    First get all string values with filtering empty strings and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if y != '']) 


    Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



    f2 = lambda x: [z for y in x for z in y]





    share|improve this answer


























    • @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

      – jezrael
      16 hours ago






    • 1





      @johnJones901 - Answer was edited.

      – jezrael
      15 hours ago











    • @johnJones901 - You are welcome!

      – jezrael
      15 hours ago














    6












    6








    6







    Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



    f1 = lambda x: ', '.join(x.dropna())
    #alternative for join only strings
    #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
    f2 = lambda x: [z for y in x for z in y]
    d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
    d['list'] = f2

    df = df.groupby('value_1', as_index=False).agg(d)
    print (df)
    value_1 value_2 value_3
    0 american california, nyc, texas walmart, kmart
    1 canadian toronto dunkinDonuts, walmart

    list
    0 [supermarket, connivence, state]
    1 [coffee, supermarket]


    Explanation:



    f1 and f2 are lambda functions.



    First remove missing values (if exist) and join strings with separator:



    f1 = lambda x: ', '.join(x.dropna())


    First get only strings values (omit missing values, because NaNs) and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


    First get all string values with filtering empty strings and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if y != '']) 


    Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



    f2 = lambda x: [z for y in x for z in y]





    share|improve this answer















    Create dynamically dictionary by all columns with no list and value_1 and for list use lambda function with list comprehension with flatenning:



    f1 = lambda x: ', '.join(x.dropna())
    #alternative for join only strings
    #f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])
    f2 = lambda x: [z for y in x for z in y]
    d = dict.fromkeys(df.columns.difference(['value_1','list']), f1)
    d['list'] = f2

    df = df.groupby('value_1', as_index=False).agg(d)
    print (df)
    value_1 value_2 value_3
    0 american california, nyc, texas walmart, kmart
    1 canadian toronto dunkinDonuts, walmart

    list
    0 [supermarket, connivence, state]
    1 [coffee, supermarket]


    Explanation:



    f1 and f2 are lambda functions.



    First remove missing values (if exist) and join strings with separator:



    f1 = lambda x: ', '.join(x.dropna())


    First get only strings values (omit missing values, because NaNs) and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if isinstance(y, str)])


    First get all string values with filtering empty strings and join strings with separator:



    f1 = lambda x: ', '.join([y for y in x if y != '']) 


    Function f2 is for flatten lists, because after aggregation get nested lists like [['a','b'], ['c']]



    f2 = lambda x: [z for y in x for z in y]






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 15 hours ago

























    answered 16 hours ago









    jezraeljezrael

    342k25297369




    342k25297369













    • @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

      – jezrael
      16 hours ago






    • 1





      @johnJones901 - Answer was edited.

      – jezrael
      15 hours ago











    • @johnJones901 - You are welcome!

      – jezrael
      15 hours ago



















    • @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

      – jezrael
      16 hours ago






    • 1





      @johnJones901 - Answer was edited.

      – jezrael
      15 hours ago











    • @johnJones901 - You are welcome!

      – jezrael
      15 hours ago

















    @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

    – jezrael
    16 hours ago





    @johnJones901 - Can you check change f1 to f1 = lambda x: ', '.join([y for y in x if y != '']) ?

    – jezrael
    16 hours ago




    1




    1





    @johnJones901 - Answer was edited.

    – jezrael
    15 hours ago





    @johnJones901 - Answer was edited.

    – jezrael
    15 hours ago













    @johnJones901 - You are welcome!

    – jezrael
    15 hours ago





    @johnJones901 - You are welcome!

    – jezrael
    15 hours ago













    4














    You could groupby value_1 and aggregate with the following function for the strings:



    def fun(x):
    return x.str.cat(sep=', ')


    And use GroupBy.sum to append the lists in the column list:



    df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': fun, 'value_3':fun})

    list value_2
    value_1
    american [supermarket, connivence, state] california, nyc, texas
    canadian [coffee, sipermarket] toronto, texas

    value_3
    value_1
    american walmart, kmart, dunkinDonuts
    canadian dunkinDonuts, walmart





    share|improve this answer






























      4














      You could groupby value_1 and aggregate with the following function for the strings:



      def fun(x):
      return x.str.cat(sep=', ')


      And use GroupBy.sum to append the lists in the column list:



      df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': fun, 'value_3':fun})

      list value_2
      value_1
      american [supermarket, connivence, state] california, nyc, texas
      canadian [coffee, sipermarket] toronto, texas

      value_3
      value_1
      american walmart, kmart, dunkinDonuts
      canadian dunkinDonuts, walmart





      share|improve this answer




























        4












        4








        4







        You could groupby value_1 and aggregate with the following function for the strings:



        def fun(x):
        return x.str.cat(sep=', ')


        And use GroupBy.sum to append the lists in the column list:



        df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': fun, 'value_3':fun})

        list value_2
        value_1
        american [supermarket, connivence, state] california, nyc, texas
        canadian [coffee, sipermarket] toronto, texas

        value_3
        value_1
        american walmart, kmart, dunkinDonuts
        canadian dunkinDonuts, walmart





        share|improve this answer















        You could groupby value_1 and aggregate with the following function for the strings:



        def fun(x):
        return x.str.cat(sep=', ')


        And use GroupBy.sum to append the lists in the column list:



        df.replace('',None).groupby('value_1').agg({'list':'sum', 'value_2': fun, 'value_3':fun})

        list value_2
        value_1
        american [supermarket, connivence, state] california, nyc, texas
        canadian [coffee, sipermarket] toronto, texas

        value_3
        value_1
        american walmart, kmart, dunkinDonuts
        canadian dunkinDonuts, walmart






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 14 hours ago

























        answered 16 hours ago









        yatuyatu

        11.7k31238




        11.7k31238






















            johnJones901 is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            johnJones901 is a new contributor. Be nice, and check out our Code of Conduct.













            johnJones901 is a new contributor. Be nice, and check out our Code of Conduct.












            johnJones901 is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54944344%2fpandas-how-to-group-by-a-value-in-column-when-there-is-list-in-one-of-the-colum%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

            How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

            Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...